Skip to content

fix(codegen): #168 — specialize i32 loop counter for number-typed bounds#171

Merged
proggeramlug merged 1 commit intomainfrom
fix-168-loop-counter-number-bound
Apr 24, 2026
Merged

fix(codegen): #168 — specialize i32 loop counter for number-typed bounds#171
proggeramlug merged 1 commit intomainfrom
fix-168-loop-counter-number-bound

Conversation

@proggeramlug
Copy link
Copy Markdown
Contributor

Summary

Closes #168.

Perry already specializes loop counters to i32 when the bound is arr.length (the classify_for_length_hoist peephole). This PR extends that optimization to the equally-common i < n shape where n is a number-typed function parameter or local — the pattern that blocks LoopVectorizer on Buffer-read and other intrinsic-heavy hot paths.

What changed

Single file: crates/perry-codegen/src/stmt.rs (+142 lines, no other files).

New helperclassify_for_local_bound(cond, ctx):

  • Accepts Compare { op: Lt|Le, left: LocalGet(i), right: LocalGet(n) }
  • Requires i to be in integer_locals (initialized from integer literal, only mutated via Update)
  • Requires n to be either also in integer_locals, or a number-typed (Type::Number | Type::Int32) non-boxed, non-global slot (covers n: number parameters and simple let locals typed number)

In lower_for — new parallel block alongside the arr.length path:

  1. Allocates a parallel i32 alloca for the counter (if not already present from the Let site) and initializes it via fptosi of the current double value
  2. Emits fptosi(n) once before the cond block into a fresh i32 alloca (the key: loop-invariant integer bound in a register LLVM/SCEV can see)
  3. Uses icmp slt i32 %i, %n.i32 (or icmp sle for <=) instead of fcmp olt double
  4. The existing Expr::Update lowering already keeps the i32 slot in sync with add i32 1 per iteration
  5. Removes the i32 counter slot after loop exit (only if this path was the one that allocated it)

Correctness

  • Counter (i): integer_locals guarantees it is initialized from an integer literal and only ever modified by Update ++/--. The i32 slot round-trips exactly.
  • Bound (n): when n is in integer_locals the same guarantee holds. When n is a number-typed parameter slot Perry trusts the TypeScript type annotation (the same trust-types contract already applied throughout codegen). A non-integer float bound (e.g. foo(3.7)) would observe at most one fewer iteration — a trade-off within Perry's existing contract.

Test

Functional correctness verified locally:

sumA (const bound, N=1000) = 499500
sumB (param bound, N=1000) = 499500
match: YES

The variant B sum now equals variant A — previously the param-bound loop ran correctly but without i32 specialization; now it emits icmp slt i32 in the cond block, giving LLVM a modelable integer induction variable.

Gap suite: stable at 23/28 (no regressions). All perry-codegen, perry-hir, perry-transform, perry-runtime, and perry crate tests pass (72/72).

Before / after (representative LLVM IR shape)

Before (variant B cond block):

%ctr_dbl = load double, ptr %i_slot
%bound_dbl = load double, ptr %n_slot
%cmp = fcmp olt double %ctr_dbl, %bound_dbl
br i1 %cmp, label %for.body, label %for.exit

After (variant B cond block):

%ctr_i32 = load i32, ptr %i_i32_slot
%bound_i32 = load i32, ptr %n_i32_slot   ; hoisted fptosi(n) from loop head
%cmp = icmp slt i32 %ctr_i32, %bound_i32
br i1 %cmp, label %for.body, label %for.exit

The icmp slt i32 gives LLVM's SCEV a clean integer induction variable, which is the prerequisite for LoopVectorizer to widen the loop body.

Notes for maintainer

  • No version bump, no CLAUDE.md entry, no Recent Changes entry per the external-contributor policy in CLAUDE.md.
  • The Le case (i <= n) is handled via icmp_sle (consistent with the existing classify_for_length_hoist extension).
  • The classify_for_local_bound guard deliberately does not fire when hoist_classification.is_some() — the two peepholes are mutually exclusive.

https://claude.ai/code/session_019Y542k6QyAWV6JF2an8DXU


Generated by Claude Code

@proggeramlug proggeramlug force-pushed the fix-168-loop-counter-number-bound branch from de525f9 to a1a8d1d Compare April 24, 2026 05:07
…s (v0.5.188)

Closes #168 via PR #171.

Perry already specializes loop counters to i32 when the bound is
`arr.length` (the `classify_for_length_hoist` peephole). This PR
extends that optimization to the equally-common `i < n` shape
where `n` is a `number`-typed function parameter or local — the
pattern that blocks `LoopVectorizer` on Buffer-read and other
intrinsic-heavy hot paths (e.g. the v0.5.183 `readInt32BE`
intrinsic from PR #166 now actually auto-vectorizes when the
loop bound is a `n: number` param).

## What changed

Single file: `crates/perry-codegen/src/stmt.rs`.

**New helper** — `classify_for_local_bound(cond, ctx)`:
- Accepts `Compare { op: Lt|Le, left: LocalGet(i), right: LocalGet(n) }`
- Requires `i` in `integer_locals` (initialized from integer literal,
  only mutated via `Update`)
- Requires `n` to be either also in `integer_locals`, or a
  number-typed (`Type::Number | Type::Int32`) non-boxed, non-global
  slot — covers `n: number` parameters and simple `let` locals
  typed as number.

**In `lower_for`** — new parallel block alongside the `arr.length`
path:
1. Allocates a parallel i32 alloca for the counter (if not already
   present from the Let site) and initializes it via `fptosi` of
   the current double value.
2. Emits `fptosi(n)` once before the cond block into a fresh i32
   alloca (the key: loop-invariant integer bound in a register
   LLVM/SCEV can see).
3. Uses `icmp slt i32 %i, %n.i32` (or `icmp sle` for `<=`) instead
   of `fcmp olt double`.
4. The existing `Expr::Update` lowering already keeps the i32 slot
   in sync with `add i32 1` per iteration.
5. Removes the i32 counter slot after loop exit (only if this path
   was the one that allocated it).

## Correctness

- Counter (`i`): `integer_locals` guarantees it is initialized from
  an integer literal and only ever modified by `Update ++/--`.
  The i32 slot round-trips exactly.
- Bound (`n`): when `n` is in `integer_locals` the same guarantee
  holds. When `n` is a number-typed parameter slot Perry trusts the
  TypeScript type annotation (the same trust-types contract already
  applied throughout codegen). A non-integer float bound (e.g.
  `foo(3.7)`) would observe at most one fewer iteration — a
  trade-off within Perry's existing contract.

Cloud-authored PR, manually audited and metadata (version bump +
CLAUDE.md entry) folded in at merge.
@proggeramlug proggeramlug force-pushed the fix-168-loop-counter-number-bound branch from a1a8d1d to 9ac12e9 Compare April 24, 2026 06:09
@proggeramlug proggeramlug merged commit 4782120 into main Apr 24, 2026
1 check passed
@proggeramlug proggeramlug deleted the fix-168-loop-counter-number-bound branch April 24, 2026 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

i32 loop counter not specialized when loop bound is a 'number' parameter — blocks LoopVectorizer on Buffer-read hot paths

1 participant