fix(opt): i64 lowering miscompiles memset-style loop counter on Cortex-M (#93)#97
Merged
Merged
Conversation
86029af to
3aa616a
Compare
This was referenced May 10, 2026
…x-M (#93) Pre-fix, `optimizer_bridge::wasm_to_ir` had no handler for `WasmOp::I64ExtendI32U` / `I64ExtendI32S` / `I32WrapI64`. They fell through the catch-all `_ => Opcode::Nop`, advancing `inst_id` by 1 but never registering the produced i64-pair vregs in `vreg_to_arm`. When a downstream `Opcode::I64ShrU` / `I64Shl` looked up its shift-count vreg via `get_arm_reg`, the lookup missed the map and silently fell back to `Reg::R0` (`optimizer_bridge.rs:1333`). The ARM emitter for the variable i64 shift writes to both `rm_lo` and `rm_hi` as scratch (`AND.W rm_lo, rm_lo, #63; SUBS.W rm_hi, rm_lo, #32; ...`) — so any function that did `i64.shr_u` of an `i64.extend_i32_u`-ed shift count clobbered the AAPCS first-param register R0 inside the shift expansion. For `compiler_builtins::memset`, R0 is the destination pointer, so the byte loop's pointer was destroyed every iteration → non-terminating loop on silicon at `memset+0x4c`. This patch: - Adds `Opcode::I64ExtendI32U`, `Opcode::I64ExtendI32S`, and `Opcode::I32WrapI64` to `synth-opt`. - Handles those WasmOps in `wasm_to_ir`, with slot accounting that keeps `inst_id.saturating_sub(K)` arithmetic correct for downstream i64 ops (extend reuses the consumed i32 slot for `dest_lo`; wrap reuses the i64-lo slot and decrements `inst_id` so the natural +1 cancels with the -1 net slot delta). - Lowers them in `ir_to_arm`. Critically, the extend lowerings allocate a callee-saved consecutive pair via `alloc_i64_pair` and Mov the i32 source into `dest_lo` even when the source already lives in a non-param register. This ensures the downstream i64-shift's `rm_lo`/`rm_hi` (which the emitter treats as clobbered scratch) are never AAPCS param registers. - Updates `analyze_i64_local_gets` to skip the i32 LocalGet that feeds an `I64ExtendI32U`/`I64ExtendI32S` so the analyzer doesn't mistakenly mark it as i64. Regression test: `crates/synth-synthesis/tests/issue_93_memset_i64_codegen.rs` exercises the bug pattern in the optimized path (4-param function shifting an `i64.const` by `i64.extend_i32_u(local.get $n)`) and asserts the emitted i64-shift's `rm_lo`/`rm_hi` are not in R0..R3. Pre-fix: 3 of 5 tests fail (one per shift variant: shr_u, shl, shr_s). Post-fix: 5/5 pass. The no-optimize path was always correct and is exercised by a sanity test in the same file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3aa616a to
425b8b9
Compare
avrabe
added a commit
that referenced
this pull request
May 11, 2026
Adds a workspace-excluded fuzz crate under `fuzz/` with four targeted libfuzzer harnesses for the WASM → IR → ARM lowering pipeline. Each harness inverts a specific bug class that has actually shipped (or nearly shipped) into a fuzz-detectable invariant: * wasm_ops_lower_or_error — panic / unencodable-instr class * wasm_to_ir_roundtrip_op_coverage — silent op-drop class (issue #93) * i64_lowering_doesnt_clobber_params — AAPCS-clobber class (#85, #86) * encoder_no_panic — encoder panic on valid ArmOp Layout: `fuzz/` is excluded from the main workspace via `[workspace] members = []` so the libfuzzer-sys ASan toolchain requirement does not leak into stable workspace builds. `fuzz/src/common.rs` defines a curated Arbitrary-derived `FuzzOp` mirror of `synth_core::WasmOp` (i32/i64 arithmetic + comparison + conversions + locals) that lowers deterministically into `WasmOp`, so any libfuzzer crash carries a replayable WASM op sequence. `wasm_to_ir_roundtrip_op_coverage` would have caught issue #93 by itself: it asserts that every value-producing wasm op leaves at least one IR instruction after `optimize_full` (with all optimizations disabled). The three issue-#93 conversion ops (`I64ExtendI32U/S`, `I32WrapI64`) are *skipped* until PR #97 lands; the skip block is inline-documented and removing it after #97 restores full coverage and re-arms the regression check. CI smoke job (`.github/workflows/fuzz-smoke.yml`) runs each harness for 60 seconds on every PR, on the same self-hosted Linux pool as the rest of synth CI, with crash artifacts uploaded on failure. Long-budget nightly runs are out of scope for this PR. Trace: NFR-002 Trace: issue-82 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
May 11, 2026
Follow-up to issue #93 (silicon-blocking memset bug). The root cause was optimizer_bridge::wasm_to_ir silently dropping three wasm ops (I64ExtendI32U/S, I32WrapI64) — they produced vregs that never got mapped to ARM registers. The downstream lookup in get_arm_reg returned Reg::R0 as a silent fallback, so subsequent i64 shifts read R0 as rm_lo/rm_hi, destroying memset's destination pointer on real silicon. Replace the silent fallback with a panic that names the vreg. Future "wasm op silently dropped" bugs of this class will surface at the boundary instead of producing miscompiled firmware. A compiler crash is strictly better than a hung MCU. Verified: * cargo test --workspace — all pass (no path legitimately hits the fallback — the post-#97 codebase produces mappings for every wasm op it emits IR for) * cargo clippy --workspace --all-targets -- -D warnings — clean * cargo fmt --check — clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
avrabe
added a commit
that referenced
this pull request
May 11, 2026
Adds a workspace-excluded fuzz crate under `fuzz/` with four targeted libfuzzer harnesses for the WASM → IR → ARM lowering pipeline. Each harness inverts a specific bug class that has actually shipped (or nearly shipped) into a fuzz-detectable invariant: * wasm_ops_lower_or_error — panic / unencodable-instr class * wasm_to_ir_roundtrip_op_coverage — silent op-drop class (issue #93) * i64_lowering_doesnt_clobber_params — AAPCS-clobber class (#85, #86) * encoder_no_panic — encoder panic on valid ArmOp Layout: `fuzz/` is excluded from the main workspace via `[workspace] members = []` so the libfuzzer-sys ASan toolchain requirement does not leak into stable workspace builds. `fuzz/src/common.rs` defines a curated Arbitrary-derived `FuzzOp` mirror of `synth_core::WasmOp` (i32/i64 arithmetic + comparison + conversions + locals) that lowers deterministically into `WasmOp`, so any libfuzzer crash carries a replayable WASM op sequence. `wasm_to_ir_roundtrip_op_coverage` would have caught issue #93 by itself: it asserts that every value-producing wasm op leaves at least one IR instruction after `optimize_full` (with all optimizations disabled). The three issue-#93 conversion ops (`I64ExtendI32U/S`, `I32WrapI64`) are *skipped* until PR #97 lands; the skip block is inline-documented and removing it after #97 restores full coverage and re-arms the regression check. CI smoke job (`.github/workflows/fuzz-smoke.yml`) runs each harness for 60 seconds on every PR, on the same self-hosted Linux pool as the rest of synth CI, with crash artifacts uploaded on failure. Long-budget nightly runs are out of scope for this PR. Trace: NFR-002 Trace: issue-82 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
May 11, 2026
Follow-up to issue #93 (silicon-blocking memset bug). The root cause was optimizer_bridge::wasm_to_ir silently dropping three wasm ops (I64ExtendI32U/S, I32WrapI64) — they produced vregs that never got mapped to ARM registers. The downstream lookup in get_arm_reg returned Reg::R0 as a silent fallback, so subsequent i64 shifts read R0 as rm_lo/rm_hi, destroying memset's destination pointer on real silicon. Replace the silent fallback with a panic that names the vreg. Future "wasm op silently dropped" bugs of this class will surface at the boundary instead of producing miscompiled firmware. A compiler crash is strictly better than a hung MCU. Verified: * cargo test --workspace — all pass (no path legitimately hits the fallback — the post-#97 codebase produces mappings for every wasm op it emits IR for) * cargo clippy --workspace --all-targets -- -D warnings — clean * cargo fmt --check — clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
May 12, 2026
Adds a workspace-excluded fuzz crate under `fuzz/` with four targeted libfuzzer harnesses for the WASM → IR → ARM lowering pipeline. Each harness inverts a specific bug class that has actually shipped (or nearly shipped) into a fuzz-detectable invariant: * wasm_ops_lower_or_error — panic / unencodable-instr class * wasm_to_ir_roundtrip_op_coverage — silent op-drop class (issue #93) * i64_lowering_doesnt_clobber_params — AAPCS-clobber class (#85, #86) * encoder_no_panic — encoder panic on valid ArmOp Layout: `fuzz/` is excluded from the main workspace via `[workspace] members = []` so the libfuzzer-sys ASan toolchain requirement does not leak into stable workspace builds. `fuzz/src/common.rs` defines a curated Arbitrary-derived `FuzzOp` mirror of `synth_core::WasmOp` (i32/i64 arithmetic + comparison + conversions + locals) that lowers deterministically into `WasmOp`, so any libfuzzer crash carries a replayable WASM op sequence. `wasm_to_ir_roundtrip_op_coverage` would have caught issue #93 by itself: it asserts that every value-producing wasm op leaves at least one IR instruction after `optimize_full` (with all optimizations disabled). The three issue-#93 conversion ops (`I64ExtendI32U/S`, `I32WrapI64`) are *skipped* until PR #97 lands; the skip block is inline-documented and removing it after #97 restores full coverage and re-arms the regression check. CI smoke job (`.github/workflows/fuzz-smoke.yml`) runs each harness for 60 seconds on every PR, on the same self-hosted Linux pool as the rest of synth CI, with crash artifacts uploaded on failure. Long-budget nightly runs are out of scope for this PR. Trace: NFR-002 Trace: issue-82 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
May 12, 2026
Follow-up to issue #93 (silicon-blocking memset bug). The root cause was optimizer_bridge::wasm_to_ir silently dropping three wasm ops (I64ExtendI32U/S, I32WrapI64) — they produced vregs that never got mapped to ARM registers. The downstream lookup in get_arm_reg returned Reg::R0 as a silent fallback, so subsequent i64 shifts read R0 as rm_lo/rm_hi, destroying memset's destination pointer on real silicon. Replace the silent fallback with a panic that names the vreg. Future "wasm op silently dropped" bugs of this class will surface at the boundary instead of producing miscompiled firmware. A compiler crash is strictly better than a hung MCU. Verified: * cargo test --workspace — all pass (no path legitimately hits the fallback — the post-#97 codebase produces mappings for every wasm op it emits IR for) * cargo clippy --workspace --all-targets -- -D warnings — clean * cargo fmt --check — clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
May 13, 2026
… usages (#108) Sweep of 47 hardcoded R0..R3 sites across the optimizer + selector. Closes the recurring class that PR #86 (i64 const), #97 (i64 extend/wrap), #106 (24 i64 ops), and #107 (CSE MemLoad/Store) each addressed one slice of. Categories audited: * wasm_to_ir slot accounting for i64 arith/extend ops (15 + 3 ops) * wasm_to_ir op-handler gaps (10 wasm ops → 6 new Opcode variants) * CSE pass: missing src-vreg resolution for all i64 opcodes + alias invalidation on re-def * ir_to_arm hardcoded `Reg::R0` / `Reg::R3` destinations (24 handlers fixed via `alloc_i32_scratch`) * select_with_stack i64 sub-word loads hardcoding `R0:R1` dest pair 27 regression tests added across 4 new test files. Defensive panic stays in PR #101 (one remaining v13 vreg case during simple_add.wat fib compilation is being tracked down — silent R0 fallback preserved in this PR).
avrabe
added a commit
that referenced
this pull request
May 13, 2026
Follow-up to issue #93 (silicon-blocking memset bug). The root cause was optimizer_bridge::wasm_to_ir silently dropping three wasm ops (I64ExtendI32U/S, I32WrapI64) — they produced vregs that never got mapped to ARM registers. The downstream lookup in get_arm_reg returned Reg::R0 as a silent fallback, so subsequent i64 shifts read R0 as rm_lo/rm_hi, destroying memset's destination pointer on real silicon. Replace the silent fallback with a panic that names the vreg. Future "wasm op silently dropped" bugs of this class will surface at the boundary instead of producing miscompiled firmware. A compiler crash is strictly better than a hung MCU. Verified: * cargo test --workspace — all pass (no path legitimately hits the fallback — the post-#97 codebase produces mappings for every wasm op it emits IR for) * cargo clippy --workspace --all-targets -- -D warnings — clean * cargo fmt --check — clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
May 13, 2026
Adds a workspace-excluded fuzz crate under `fuzz/` with four targeted libfuzzer harnesses for the WASM → IR → ARM lowering pipeline. Each harness inverts a specific bug class that has actually shipped (or nearly shipped) into a fuzz-detectable invariant: * wasm_ops_lower_or_error — panic / unencodable-instr class * wasm_to_ir_roundtrip_op_coverage — silent op-drop class (issue #93) * i64_lowering_doesnt_clobber_params — AAPCS-clobber class (#85, #86) * encoder_no_panic — encoder panic on valid ArmOp Layout: `fuzz/` is excluded from the main workspace via `[workspace] members = []` so the libfuzzer-sys ASan toolchain requirement does not leak into stable workspace builds. `fuzz/src/common.rs` defines a curated Arbitrary-derived `FuzzOp` mirror of `synth_core::WasmOp` (i32/i64 arithmetic + comparison + conversions + locals) that lowers deterministically into `WasmOp`, so any libfuzzer crash carries a replayable WASM op sequence. `wasm_to_ir_roundtrip_op_coverage` would have caught issue #93 by itself: it asserts that every value-producing wasm op leaves at least one IR instruction after `optimize_full` (with all optimizations disabled). The three issue-#93 conversion ops (`I64ExtendI32U/S`, `I32WrapI64`) are *skipped* until PR #97 lands; the skip block is inline-documented and removing it after #97 restores full coverage and re-arms the regression check. CI smoke job (`.github/workflows/fuzz-smoke.yml`) runs each harness for 60 seconds on every PR, on the same self-hosted Linux pool as the rest of synth CI, with crash artifacts uploaded on failure. Long-budget nightly runs are out of scope for this PR. Trace: NFR-002 Trace: issue-82 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
May 14, 2026
Follow-up to issue #93 (silicon-blocking memset bug). The root cause was optimizer_bridge::wasm_to_ir silently dropping three wasm ops (I64ExtendI32U/S, I32WrapI64) — they produced vregs that never got mapped to ARM registers. The downstream lookup in get_arm_reg returned Reg::R0 as a silent fallback, so subsequent i64 shifts read R0 as rm_lo/rm_hi, destroying memset's destination pointer on real silicon. Replace the silent fallback with a panic that names the vreg. Future "wasm op silently dropped" bugs of this class will surface at the boundary instead of producing miscompiled firmware. A compiler crash is strictly better than a hung MCU. Verified: * cargo test --workspace — all pass (no path legitimately hits the fallback — the post-#97 codebase produces mappings for every wasm op it emits IR for) * cargo clippy --workspace --all-targets -- -D warnings — clean * cargo fmt --check — clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
May 14, 2026
Adds a workspace-excluded fuzz crate under `fuzz/` with four targeted libfuzzer harnesses for the WASM → IR → ARM lowering pipeline. Each harness inverts a specific bug class that has actually shipped (or nearly shipped) into a fuzz-detectable invariant: * wasm_ops_lower_or_error — panic / unencodable-instr class * wasm_to_ir_roundtrip_op_coverage — silent op-drop class (issue #93) * i64_lowering_doesnt_clobber_params — AAPCS-clobber class (#85, #86) * encoder_no_panic — encoder panic on valid ArmOp Layout: `fuzz/` is excluded from the main workspace via `[workspace] members = []` so the libfuzzer-sys ASan toolchain requirement does not leak into stable workspace builds. `fuzz/src/common.rs` defines a curated Arbitrary-derived `FuzzOp` mirror of `synth_core::WasmOp` (i32/i64 arithmetic + comparison + conversions + locals) that lowers deterministically into `WasmOp`, so any libfuzzer crash carries a replayable WASM op sequence. `wasm_to_ir_roundtrip_op_coverage` would have caught issue #93 by itself: it asserts that every value-producing wasm op leaves at least one IR instruction after `optimize_full` (with all optimizations disabled). The three issue-#93 conversion ops (`I64ExtendI32U/S`, `I32WrapI64`) are *skipped* until PR #97 lands; the skip block is inline-documented and removing it after #97 restores full coverage and re-arms the regression check. CI smoke job (`.github/workflows/fuzz-smoke.yml`) runs each harness for 60 seconds on every PR, on the same self-hosted Linux pool as the rest of synth CI, with crash artifacts uploaded on failure. Long-budget nightly runs are out of scope for this PR. Trace: NFR-002 Trace: issue-82 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
May 14, 2026
Adds a workspace-excluded fuzz crate under `fuzz/` with four targeted libfuzzer harnesses for the WASM → IR → ARM lowering pipeline. Each harness inverts a specific bug class that has actually shipped (or nearly shipped) into a fuzz-detectable invariant: * wasm_ops_lower_or_error — panic / unencodable-instr class * wasm_to_ir_roundtrip_op_coverage — silent op-drop class (issue #93) * i64_lowering_doesnt_clobber_params — AAPCS-clobber class (#85, #86) * encoder_no_panic — encoder panic on valid ArmOp Layout: `fuzz/` is excluded from the main workspace via `[workspace] members = []` so the libfuzzer-sys ASan toolchain requirement does not leak into stable workspace builds. `fuzz/src/common.rs` defines a curated Arbitrary-derived `FuzzOp` mirror of `synth_core::WasmOp` (i32/i64 arithmetic + comparison + conversions + locals) that lowers deterministically into `WasmOp`, so any libfuzzer crash carries a replayable WASM op sequence. `wasm_to_ir_roundtrip_op_coverage` would have caught issue #93 by itself: it asserts that every value-producing wasm op leaves at least one IR instruction after `optimize_full` (with all optimizations disabled). The three issue-#93 conversion ops (`I64ExtendI32U/S`, `I32WrapI64`) are *skipped* until PR #97 lands; the skip block is inline-documented and removing it after #97 restores full coverage and re-arms the regression check. CI smoke job (`.github/workflows/fuzz-smoke.yml`) runs each harness for 60 seconds on every PR, on the same self-hosted Linux pool as the rest of synth CI, with crash artifacts uploaded on failure. Long-budget nightly runs are out of scope for this PR. Trace: NFR-002 Trace: issue-82 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
May 14, 2026
Adds a workspace-excluded fuzz crate under `fuzz/` with four targeted libfuzzer harnesses for the WASM → IR → ARM lowering pipeline. Each harness inverts a specific bug class that has actually shipped (or nearly shipped) into a fuzz-detectable invariant: * wasm_ops_lower_or_error — panic / unencodable-instr class * wasm_to_ir_roundtrip_op_coverage — silent op-drop class (issue #93) * i64_lowering_doesnt_clobber_params — AAPCS-clobber class (#85, #86) * encoder_no_panic — encoder panic on valid ArmOp Layout: `fuzz/` is excluded from the main workspace via `[workspace] members = []` so the libfuzzer-sys ASan toolchain requirement does not leak into stable workspace builds. `fuzz/src/common.rs` defines a curated Arbitrary-derived `FuzzOp` mirror of `synth_core::WasmOp` (i32/i64 arithmetic + comparison + conversions + locals) that lowers deterministically into `WasmOp`, so any libfuzzer crash carries a replayable WASM op sequence. `wasm_to_ir_roundtrip_op_coverage` would have caught issue #93 by itself: it asserts that every value-producing wasm op leaves at least one IR instruction after `optimize_full` (with all optimizations disabled). The three issue-#93 conversion ops (`I64ExtendI32U/S`, `I32WrapI64`) are *skipped* until PR #97 lands; the skip block is inline-documented and removing it after #97 restores full coverage and re-arms the regression check. CI smoke job (`.github/workflows/fuzz-smoke.yml`) runs each harness for 60 seconds on every PR, on the same self-hosted Linux pool as the rest of synth CI, with crash artifacts uploaded on failure. Long-budget nightly runs are out of scope for this PR. Trace: NFR-002 Trace: issue-82 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #93.
Diagnosis
The default
synth compile --target cortex-m4f(optimized) path incrates/synth-synthesis/src/optimizer_bridge.rshad no handler forWasmOp::I64ExtendI32U/WasmOp::I64ExtendI32S/WasmOp::I32WrapI64.They fell through the catch-all
_ => Opcode::Noparm inwasm_to_ir,advancing
inst_idby 1 but never registering the produced i64-pairvregs in
vreg_to_arm.When a downstream
Opcode::I64ShrU/I64Shl/I64ShrSlooked up itsshift-count vreg via
get_arm_reg, the lookup missed the map and silentlyfell back to
Reg::R0(optimizer_bridge.rs:1333):The ARM emitter for the variable 64-bit shift (
crates/synth-backend/src/arm_encoder.rs:2856forI64Shl,:2938forI64ShrU,:3019forI64ShrS) writes to bothrm_loandrm_hias scratch:So any wasm function doing
i64.shr_uof ani64.extend_i32_u-ed shiftcount clobbered AAPCS first-param register R0 inside the shift
expansion. For
compiler_builtins::memset, R0 is the destinationpointer, so the byte loop's pointer was destroyed every iteration →
non-terminating loop on STM32G474RE silicon at
memset+0x4c(PCbouncing between
0x...c668and0x...c67e).Repro WAT (minimal)
The
WasmOpsequence used by the regression test is the same shape; seecrates/synth-synthesis/tests/issue_93_memset_i64_codegen.rs::repro_wasm_ops. Pre-fix, the optimized path emits:rm_hi: R0is the smoking gun — that's the first AAPCS param, and the encoder writes to it inside the shift expansion. Post-fix, the same sequence allocates a callee-saved pair (R4..R11) for the shift count, leaving R0..R3 untouched.Fix scope
Three additions, all narrow:
crates/synth-opt/src/lib.rs— adds threeOpcodevariants:I64ExtendI32U,I64ExtendI32S,I32WrapI64. All optimizationpasses have
_ => {}wildcards, so existing passes are unaffected.crates/synth-synthesis/src/optimizer_bridge.rs:wasm_to_ir: handlers for the three new ops, with slot accountingthat keeps
inst_id.saturating_sub(K)arithmetic correct fordownstream i64 ops (extend reuses the consumed i32 slot for
dest_lo; wrap reuses the i64-lo slot and decrementsinst_idso the natural +1 cancels with the -1 net slot delta).
ir_to_arm: lowerings that allocate a callee-saved consecutivepair via
alloc_i64_pairandMovthe i32 source into the newdest_lo— even when the source is already in a non-paramregister — so the downstream shift's clobbered
rm_lo/rm_hican never be a live AAPCS param.
analyze_i64_local_gets: skips the i32 producer that feeds anI64ExtendI32U/I64ExtendI32S, otherwise the analyzer wouldmistakenly mark that
LocalGetas i64 and double-load the local.crates/synth-synthesis/tests/issue_93_memset_i64_codegen.rs—5 new regression tests:
issue_93_no_optimize_path_handles_repro— sanity check thatselect_with_stackis and was always correct for this input.issue_93_optimized_i64_shr_u_does_not_use_r0_as_shift_count—the headline regression. Asserts the optimized path's
I64ShrUemission has
rm_lo/rm_hioutside R0..R3.issue_93_optimized_i64_shr_u_value_and_count_are_distinct— novalue/count register aliasing.
issue_93_optimized_i64_shl_does_not_use_r0_as_shift_count—same for
I64Shl(useslocal.get 2to also exercise R2 path).issue_93_optimized_i64_shr_s_does_not_use_r0_as_shift_count—same for
I64ShrS(sign-extend variant).Pre-fix: 3/5 fail (one per shift variant). Post-fix: 5/5 pass.
Verification
cargo test --workspace— all green (1095 tests, +5 from this PR)cargo clippy --workspace --all-targets -- -D warnings— cleancargo fmt --check— cleanv0.2.1 patch candidate
This is silicon-blocking for any merged-wasm bench integration that
links
compiler_builtins. The patch is intentionally narrow (norefactor, no broader regalloc rework) and targeted at the v0.2.1 patch
release line.
🤖 Generated with Claude Code