Skip to content

docs: Audit crate boundaries and responsibilities#32

Merged
avrabe merged 1 commit into
mainfrom
phase0/issue-9-audit-crate-boundaries
Jan 10, 2026
Merged

docs: Audit crate boundaries and responsibilities#32
avrabe merged 1 commit into
mainfrom
phase0/issue-9-audit-crate-boundaries

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented Jan 10, 2026

Summary

Comprehensive audit of the 14 Synth crates to document responsibilities, identify overlaps, and propose improvements.

Deliverable

Created docs/architecture/CRATE_STRUCTURE.md with:

  • Summary table - All 14 crates with line counts and dependencies
  • Dependency graph - ASCII diagram of crate relationships
  • Compilation pipeline - Visual flow from WASM to ELF
  • Crate details - Purpose and key types for each crate
  • Key questions answered:
    • synth-opt vs synth-synthesis
    • synth-codegen vs synth-backend
  • Refactoring recommendations

Key Finding

The crate structure is well-designed with clear separation of concerns.
No major refactoring required.

The pipeline flows cleanly:

synth-frontend → synth-opt → synth-synthesis → synth-regalloc → synth-codegen → synth-backend

Recommendation

Focus on wiring the pipeline together rather than restructuring crates.

Issues #10 and #11 (merge/split crates) can be closed as "won't fix" since the analysis shows the current structure is already appropriate.

Closes #9

Created comprehensive crate structure documentation:

docs/architecture/CRATE_STRUCTURE.md includes:
- Summary table of all 14 crates with line counts and dependencies
- ASCII dependency graph showing crate relationships
- Compilation pipeline diagram
- Detailed breakdown of each crate's purpose
- Answered key questions:
  - synth-opt vs synth-synthesis: optimization vs instruction selection
  - synth-codegen vs synth-backend: assembly generation vs binary emission
- Refactoring recommendations (conclusion: structure is already good)

Key finding: Crate structure is well-designed with clear separation.
Priority should be wiring the pipeline, not restructuring crates.

Closes #9
@avrabe avrabe added this to the phase-0-cleanup milestone Jan 10, 2026
@avrabe avrabe merged commit 474abbb into main Jan 10, 2026
@avrabe avrabe deleted the phase0/issue-9-audit-crate-boundaries branch January 10, 2026 16:30
avrabe added a commit that referenced this pull request Mar 1, 2026
…ault

Shift operations (SHL, SHR_S, SHR_U, ROTL, ROTR) had `shift: 0`
hardcoded, using immediate-shift ARM instructions. In WASM, the shift
amount is always dynamic (popped from stack), so these must use
register-based forms (LslReg, AsrReg, LsrReg, RorReg).

ROTL is implemented as RSB tmp, rm, #32 + ROR rd, rn, tmp (rotate left
by N = rotate right by 32-N).

Also: CLI now compiles all exported functions by default when no
--func-index or --func-name is specified, fixing the "only first
function compiled" issue (#35).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Mar 2, 2026
…aching 188 Qed / 52 Admitted

Phase 4: Align Rocq model with Rust instruction selector for shift/rotate ops.

ARM model changes:
- Add LSL_reg, LSR_reg, ASR_reg, ROR_reg (register-based shifts) to ArmInstructions.v
- Add RSB (reverse subtract) instruction for rotl compilation
- Add semantics for all 5 new instructions in ArmSemantics.v

Compilation alignment:
- I32/I64 shifts now use register-based instructions (LSL_reg R0 R0 R1, etc.)
- I32/I64 rotl now uses RSB R2 R1 #32; ROR_reg R0 R0 R2 (matching Rust)
- I32/I64 rotr now uses ROR_reg R0 R0 R1

Proofs closed:
- i32_shl_correct, i32_shru_correct, i32_shrs_correct (via synth_binop_proof)
- i32_rotr_correct (via synth_binop_proof)
- i32_rotl_correct (explicit witness with set/subst + get_set_reg lemmas)
- Added I32.rotl_rotr_sub axiom connecting rotl to rotr via complement

T1 result-correspondence: 34 → 39 (+5 shift/rotate)
CorrectnessI32.v: 29 Qed / 0 Admitted (all i32 ops now have T1 proofs)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request May 10, 2026
…x-M (#93)

Pre-fix, `optimizer_bridge::wasm_to_ir` had no handler for
`WasmOp::I64ExtendI32U` / `I64ExtendI32S` / `I32WrapI64`. They fell through
the catch-all `_ => Opcode::Nop`, advancing `inst_id` by 1 but never
registering the produced i64-pair vregs in `vreg_to_arm`. When a downstream
`Opcode::I64ShrU` / `I64Shl` looked up its shift-count vreg via
`get_arm_reg`, the lookup missed the map and silently fell back to
`Reg::R0` (`optimizer_bridge.rs:1333`). The ARM emitter for the variable
i64 shift writes to both `rm_lo` and `rm_hi` as scratch (`AND.W rm_lo,
rm_lo, #63; SUBS.W rm_hi, rm_lo, #32; ...`) — so any function that did
`i64.shr_u` of an `i64.extend_i32_u`-ed shift count clobbered the AAPCS
first-param register R0 inside the shift expansion. For
`compiler_builtins::memset`, R0 is the destination pointer, so the byte
loop's pointer was destroyed every iteration → non-terminating loop on
silicon at `memset+0x4c`.

This patch:

- Adds `Opcode::I64ExtendI32U`, `Opcode::I64ExtendI32S`, and
  `Opcode::I32WrapI64` to `synth-opt`.
- Handles those WasmOps in `wasm_to_ir`, with slot accounting that keeps
  `inst_id.saturating_sub(K)` arithmetic correct for downstream i64 ops
  (extend reuses the consumed i32 slot for `dest_lo`; wrap reuses the
  i64-lo slot and decrements `inst_id` so the natural +1 cancels with the
  -1 net slot delta).
- Lowers them in `ir_to_arm`. Critically, the extend lowerings allocate a
  callee-saved consecutive pair via `alloc_i64_pair` and Mov the i32 source
  into `dest_lo` even when the source already lives in a non-param
  register. This ensures the downstream i64-shift's `rm_lo`/`rm_hi` (which
  the emitter treats as clobbered scratch) are never AAPCS param
  registers.
- Updates `analyze_i64_local_gets` to skip the i32 LocalGet that feeds an
  `I64ExtendI32U`/`I64ExtendI32S` so the analyzer doesn't mistakenly mark
  it as i64.

Regression test: `crates/synth-synthesis/tests/issue_93_memset_i64_codegen.rs`
exercises the bug pattern in the optimized path (4-param function shifting
an `i64.const` by `i64.extend_i32_u(local.get $n)`) and asserts the
emitted i64-shift's `rm_lo`/`rm_hi` are not in R0..R3. Pre-fix: 3 of 5
tests fail (one per shift variant: shr_u, shl, shr_s). Post-fix: 5/5
pass. The no-optimize path was always correct and is exercised by a
sanity test in the same file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avrabe added a commit that referenced this pull request May 11, 2026
…x-M (#93)

Pre-fix, `optimizer_bridge::wasm_to_ir` had no handler for
`WasmOp::I64ExtendI32U` / `I64ExtendI32S` / `I32WrapI64`. They fell through
the catch-all `_ => Opcode::Nop`, advancing `inst_id` by 1 but never
registering the produced i64-pair vregs in `vreg_to_arm`. When a downstream
`Opcode::I64ShrU` / `I64Shl` looked up its shift-count vreg via
`get_arm_reg`, the lookup missed the map and silently fell back to
`Reg::R0` (`optimizer_bridge.rs:1333`). The ARM emitter for the variable
i64 shift writes to both `rm_lo` and `rm_hi` as scratch (`AND.W rm_lo,
rm_lo, #63; SUBS.W rm_hi, rm_lo, #32; ...`) — so any function that did
`i64.shr_u` of an `i64.extend_i32_u`-ed shift count clobbered the AAPCS
first-param register R0 inside the shift expansion. For
`compiler_builtins::memset`, R0 is the destination pointer, so the byte
loop's pointer was destroyed every iteration → non-terminating loop on
silicon at `memset+0x4c`.

This patch:

- Adds `Opcode::I64ExtendI32U`, `Opcode::I64ExtendI32S`, and
  `Opcode::I32WrapI64` to `synth-opt`.
- Handles those WasmOps in `wasm_to_ir`, with slot accounting that keeps
  `inst_id.saturating_sub(K)` arithmetic correct for downstream i64 ops
  (extend reuses the consumed i32 slot for `dest_lo`; wrap reuses the
  i64-lo slot and decrements `inst_id` so the natural +1 cancels with the
  -1 net slot delta).
- Lowers them in `ir_to_arm`. Critically, the extend lowerings allocate a
  callee-saved consecutive pair via `alloc_i64_pair` and Mov the i32 source
  into `dest_lo` even when the source already lives in a non-param
  register. This ensures the downstream i64-shift's `rm_lo`/`rm_hi` (which
  the emitter treats as clobbered scratch) are never AAPCS param
  registers.
- Updates `analyze_i64_local_gets` to skip the i32 LocalGet that feeds an
  `I64ExtendI32U`/`I64ExtendI32S` so the analyzer doesn't mistakenly mark
  it as i64.

Regression test: `crates/synth-synthesis/tests/issue_93_memset_i64_codegen.rs`
exercises the bug pattern in the optimized path (4-param function shifting
an `i64.const` by `i64.extend_i32_u(local.get $n)`) and asserts the
emitted i64-shift's `rm_lo`/`rm_hi` are not in R0..R3. Pre-fix: 3 of 5
tests fail (one per shift variant: shr_u, shl, shr_s). Post-fix: 5/5
pass. The no-optimize path was always correct and is exercised by a
sanity test in the same file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audit crate boundaries and responsibilities

1 participant