fix(codegen): #168 — specialize i32 loop counter for number-typed bounds#171
Merged
proggeramlug merged 1 commit intomainfrom Apr 24, 2026
Merged
Conversation
de525f9 to
a1a8d1d
Compare
…s (v0.5.188) Closes #168 via PR #171. Perry already specializes loop counters to i32 when the bound is `arr.length` (the `classify_for_length_hoist` peephole). This PR extends that optimization to the equally-common `i < n` shape where `n` is a `number`-typed function parameter or local — the pattern that blocks `LoopVectorizer` on Buffer-read and other intrinsic-heavy hot paths (e.g. the v0.5.183 `readInt32BE` intrinsic from PR #166 now actually auto-vectorizes when the loop bound is a `n: number` param). ## What changed Single file: `crates/perry-codegen/src/stmt.rs`. **New helper** — `classify_for_local_bound(cond, ctx)`: - Accepts `Compare { op: Lt|Le, left: LocalGet(i), right: LocalGet(n) }` - Requires `i` in `integer_locals` (initialized from integer literal, only mutated via `Update`) - Requires `n` to be either also in `integer_locals`, or a number-typed (`Type::Number | Type::Int32`) non-boxed, non-global slot — covers `n: number` parameters and simple `let` locals typed as number. **In `lower_for`** — new parallel block alongside the `arr.length` path: 1. Allocates a parallel i32 alloca for the counter (if not already present from the Let site) and initializes it via `fptosi` of the current double value. 2. Emits `fptosi(n)` once before the cond block into a fresh i32 alloca (the key: loop-invariant integer bound in a register LLVM/SCEV can see). 3. Uses `icmp slt i32 %i, %n.i32` (or `icmp sle` for `<=`) instead of `fcmp olt double`. 4. The existing `Expr::Update` lowering already keeps the i32 slot in sync with `add i32 1` per iteration. 5. Removes the i32 counter slot after loop exit (only if this path was the one that allocated it). ## Correctness - Counter (`i`): `integer_locals` guarantees it is initialized from an integer literal and only ever modified by `Update ++/--`. The i32 slot round-trips exactly. - Bound (`n`): when `n` is in `integer_locals` the same guarantee holds. When `n` is a number-typed parameter slot Perry trusts the TypeScript type annotation (the same trust-types contract already applied throughout codegen). A non-integer float bound (e.g. `foo(3.7)`) would observe at most one fewer iteration — a trade-off within Perry's existing contract. Cloud-authored PR, manually audited and metadata (version bump + CLAUDE.md entry) folded in at merge.
a1a8d1d to
9ac12e9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #168.
Perry already specializes loop counters to i32 when the bound is
arr.length(theclassify_for_length_hoistpeephole). This PR extends that optimization to the equally-commoni < nshape wherenis anumber-typed function parameter or local — the pattern that blocks LoopVectorizer on Buffer-read and other intrinsic-heavy hot paths.What changed
Single file:
crates/perry-codegen/src/stmt.rs(+142 lines, no other files).New helper —
classify_for_local_bound(cond, ctx):Compare { op: Lt|Le, left: LocalGet(i), right: LocalGet(n) }ito be ininteger_locals(initialized from integer literal, only mutated viaUpdate)nto be either also ininteger_locals, or a number-typed (Type::Number | Type::Int32) non-boxed, non-global slot (coversn: numberparameters and simpleletlocals typed number)In
lower_for— new parallel block alongside thearr.lengthpath:fptosiof the current double valuefptosi(n)once before the cond block into a fresh i32 alloca (the key: loop-invariant integer bound in a register LLVM/SCEV can see)icmp slt i32 %i, %n.i32(oricmp slefor<=) instead offcmp olt doubleExpr::Updatelowering already keeps the i32 slot in sync withadd i32 1per iterationCorrectness
i):integer_localsguarantees it is initialized from an integer literal and only ever modified byUpdate ++/--. The i32 slot round-trips exactly.n): whennis ininteger_localsthe same guarantee holds. Whennis a number-typed parameter slot Perry trusts the TypeScript type annotation (the same trust-types contract already applied throughout codegen). A non-integer float bound (e.g.foo(3.7)) would observe at most one fewer iteration — a trade-off within Perry's existing contract.Test
Functional correctness verified locally:
The variant B sum now equals variant A — previously the param-bound loop ran correctly but without i32 specialization; now it emits
icmp slt i32in the cond block, giving LLVM a modelable integer induction variable.Gap suite: stable at 23/28 (no regressions). All
perry-codegen,perry-hir,perry-transform,perry-runtime, andperrycrate tests pass (72/72).Before / after (representative LLVM IR shape)
Before (variant B cond block):
After (variant B cond block):
The
icmp slt i32gives LLVM's SCEV a clean integer induction variable, which is the prerequisite for LoopVectorizer to widen the loop body.Notes for maintainer
Lecase (i <= n) is handled viaicmp_sle(consistent with the existingclassify_for_length_hoistextension).classify_for_local_boundguard deliberately does not fire whenhoist_classification.is_some()— the two peepholes are mutually exclusive.https://claude.ai/code/session_019Y542k6QyAWV6JF2an8DXU
Generated by Claude Code