Skip to content

feat(opt): async-callback adapter pass (v1.0.4 #70, Track A)#125

Merged
avrabe merged 1 commit into
mainfrom
release/v1.0.4-pr-async-adapter
May 17, 2026
Merged

feat(opt): async-callback adapter pass (v1.0.4 #70, Track A)#125
avrabe merged 1 commit into
mainfrom
release/v1.0.4-pr-async-adapter

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 17, 2026

Adds Phase 4 to the component pipeline. Detects the meld P3 async-callback adapter shape and folds the discriminant test + slow-path branch when EXIT_OK is statically true. 4 new tests + ~460 LOC.

Soundness via three guards: no Unknown in module, local-read-count = 1, I32Const == 0. Per-fold encode + wasm-tools validate with revert on mismatch (mirrors PR-M).

🤖 Generated with Claude Code

Adds Phase 4 to the component pipeline at loom-core/src/component_optimizer.rs.
Detects the meld P3 async-callback adapter shape and folds the
discriminant test + slow-path branch when EXIT_OK is statically true.

## Pattern

  Call($lift); LocalSet(N); LocalGet(N); I32Const(0); I32Eq;
  If { then=fast_path, else=slow_path }

Fold target:

  Call($lift); <then=fast_path inlined>

## Soundness

- Conservative module-wide guard: skip if any function contains
  Instruction::Unknown.
- Per-pattern guard: local N must be read EXACTLY ONCE in the function
  (the LocalGet immediately after the LocalSet). Counted recursively
  including LocalTee. If anything else reads the exit code, fold is
  rejected.
- I32Const must be EXACTLY 0 (the EXIT_OK shape). Any other constant
  is rejected.
- The Call to $lift is KEPT (we cannot prove the lift pure+no-trap
  without IPA; folding away its side effects would be unsound).
- After fold, the component pipeline re-encodes + wasm-tools-validates;
  on mismatch the function set is reverted (mirrors PR-M's pattern).

## What v1.0.4 ships vs the broader roadmap

Per docs/research/v1.0.3/issue-roadmap.md §#70, the full pass is a
six-pass chain (detect → inline lift → directize → const-fold EXIT →
forward task.return shim → DCE start_task init). v1.0.4 ships step 1
(detect + fold the discriminant pattern). Steps 2-6 are infrastructure
that already exists in the pipeline (inline, directize, constant-folding,
dead-stores); composing them on the post-detection IR is v1.0.5+ work.

## Tests (4 new)

- test_async_adapter_folds_simple_case: positive case, 7-instr pattern
  folds to 2-instr (Call + fast-path constant).
- test_async_adapter_skips_when_local_read_multiple_times: safety pin
  for the read-count guard.
- test_async_adapter_skips_when_const_is_not_zero: safety pin for the
  EXIT_OK discriminant guard.
- test_async_adapter_no_op_when_pattern_absent: plain functions are
  byte-for-byte unchanged.

## Measurement

No measurable change on the current corpus (gale, calculator, the
3 v1.0.3 fixtures). Gale doesn't have the P3 async shape; calculator
might but the conservative guards skip on any Unknown. Real wins land
on meld-fused P3 components that share-memory the lift target —
specifically the workloads issue #70 was filed for. Future PR-Q
extension should add a meld-fused P3 fixture.

Trace: REQ-3, REQ-11
@avrabe avrabe merged commit 27fc4f0 into main May 17, 2026
9 of 19 checks passed
@avrabe avrabe deleted the release/v1.0.4-pr-async-adapter branch May 17, 2026 11:02
@avrabe avrabe mentioned this pull request May 18, 2026
avrabe added a commit that referenced this pull request May 18, 2026
Four-track parallel sprint with FULL success (vs v1.0.3 where Track 3 died):

  #125  Track A  async-callback adapter pass (#70 first piece)
  #126  Track B  verifier table-resolver teaching (drops directize Z3 bypass)
  #127  Track C  ægraph rewrite engine + 3 identity rules
  #128  Track D  island-model parallel optimization (#71)

Plus a subagent new-issues sweep that found zero issues since v1.0.3.

+20 tests, 380+ total. Code-section bytes unchanged on the current
corpus — all four tracks ship infrastructure that will produce
measurable wins once their consumers land in v1.0.5+.
avrabe added a commit that referenced this pull request May 19, 2026
… (v1.0.5 Track 2) (#131)

Composes the remaining five steps of the v1.0.3 #70 roadmap on the
post-detection IR produced by v1.0.4 (Track A, #125).

## Part A: orchestrator (component_optimizer.rs)

New `run_async_chain_passes(module)` runs, in order:

  2. crate::optimize::inline_functions       (force-inline async-lift thunks)
  3. crate::optimize::directize              (const-slot call_indirect → direct)
  4. crate::optimize::constant_folding       (propagate EXIT_OK discriminant)
  5. crate::optimize::eliminate_dead_code    (drop unreachable slow-path arms)
  5.5 crate::optimize::forward_global_shim   (NEW — task.return forward, lib.rs)
  6. crate::optimize::eliminate_dead_stores  (kill start_task waitable init)

Each constituent pass is no-op on functions that don't need it, so
over-applying is safe. Every pass already carries its own Z3
`verify_or_revert` gate, so the orchestrator does NOT bypass
verification — it just composes existing proven passes in the right
order for the post-detection IR.

Wiring in `optimize_core_module` mirrors the Phase 3 / Phase 4
save-and-revert pattern: encode + wasm-tools-validate after the
chain; revert on mismatch.

## Part B: forward_global_shim peephole (lib.rs)

New `optimize::forward_global_shim(module)` recognises

    global.set $g
    global.get $g     ;; immediately follows

and erases both. Soundness:

  - the global has EXACTLY ONE writer module-wide (verified via a
    pre-scan that recurses into Block/Loop/If bodies),
  - `global.get` immediately follows the matching `global.set`
    (any intervening instruction blocks the fold),
  - any function containing Instruction::Unknown disqualifies the
    module-wide pass.

The fold is byte-for-byte equivalent to keeping the value on the
stack when the global has a single writer — no observer can race
the round-trip.

## Tests (8 new, all pass)

forward_global_shim:
  - test_forward_global_shim_folds_simple_pair
  - test_forward_global_shim_skips_multiple_writers
  - test_forward_global_shim_skips_intervening_op
  - test_forward_global_shim_skips_mismatched_indices
  - test_forward_global_shim_no_op_on_plain_function
  - test_forward_global_shim_skips_unknown_instructions

chain composition:
  - test_chain_compose_eliminates_full_adapter
  - test_chain_no_op_when_pattern_absent

Targeted run (async_adapter forward_global chain_compose):
  12 passed; 0 failed; 0 ignored.

## Scope

The chain re-uses existing passes — no duplication, no Z3 bypass.
Token cost: +601 LOC (orchestrator + peephole + tests + comments).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant