feat(opt): async-callback adapter pass (v1.0.4 #70, Track A)#125
Merged
Conversation
Adds Phase 4 to the component pipeline at loom-core/src/component_optimizer.rs.
Detects the meld P3 async-callback adapter shape and folds the
discriminant test + slow-path branch when EXIT_OK is statically true.
## Pattern
Call($lift); LocalSet(N); LocalGet(N); I32Const(0); I32Eq;
If { then=fast_path, else=slow_path }
Fold target:
Call($lift); <then=fast_path inlined>
## Soundness
- Conservative module-wide guard: skip if any function contains
Instruction::Unknown.
- Per-pattern guard: local N must be read EXACTLY ONCE in the function
(the LocalGet immediately after the LocalSet). Counted recursively
including LocalTee. If anything else reads the exit code, fold is
rejected.
- I32Const must be EXACTLY 0 (the EXIT_OK shape). Any other constant
is rejected.
- The Call to $lift is KEPT (we cannot prove the lift pure+no-trap
without IPA; folding away its side effects would be unsound).
- After fold, the component pipeline re-encodes + wasm-tools-validates;
on mismatch the function set is reverted (mirrors PR-M's pattern).
## What v1.0.4 ships vs the broader roadmap
Per docs/research/v1.0.3/issue-roadmap.md §#70, the full pass is a
six-pass chain (detect → inline lift → directize → const-fold EXIT →
forward task.return shim → DCE start_task init). v1.0.4 ships step 1
(detect + fold the discriminant pattern). Steps 2-6 are infrastructure
that already exists in the pipeline (inline, directize, constant-folding,
dead-stores); composing them on the post-detection IR is v1.0.5+ work.
## Tests (4 new)
- test_async_adapter_folds_simple_case: positive case, 7-instr pattern
folds to 2-instr (Call + fast-path constant).
- test_async_adapter_skips_when_local_read_multiple_times: safety pin
for the read-count guard.
- test_async_adapter_skips_when_const_is_not_zero: safety pin for the
EXIT_OK discriminant guard.
- test_async_adapter_no_op_when_pattern_absent: plain functions are
byte-for-byte unchanged.
## Measurement
No measurable change on the current corpus (gale, calculator, the
3 v1.0.3 fixtures). Gale doesn't have the P3 async shape; calculator
might but the conservative guards skip on any Unknown. Real wins land
on meld-fused P3 components that share-memory the lift target —
specifically the workloads issue #70 was filed for. Future PR-Q
extension should add a meld-fused P3 fixture.
Trace: REQ-3, REQ-11
Merged
avrabe
added a commit
that referenced
this pull request
May 18, 2026
Four-track parallel sprint with FULL success (vs v1.0.3 where Track 3 died): #125 Track A async-callback adapter pass (#70 first piece) #126 Track B verifier table-resolver teaching (drops directize Z3 bypass) #127 Track C ægraph rewrite engine + 3 identity rules #128 Track D island-model parallel optimization (#71) Plus a subagent new-issues sweep that found zero issues since v1.0.3. +20 tests, 380+ total. Code-section bytes unchanged on the current corpus — all four tracks ship infrastructure that will produce measurable wins once their consumers land in v1.0.5+.
avrabe
added a commit
that referenced
this pull request
May 19, 2026
… (v1.0.5 Track 2) (#131) Composes the remaining five steps of the v1.0.3 #70 roadmap on the post-detection IR produced by v1.0.4 (Track A, #125). ## Part A: orchestrator (component_optimizer.rs) New `run_async_chain_passes(module)` runs, in order: 2. crate::optimize::inline_functions (force-inline async-lift thunks) 3. crate::optimize::directize (const-slot call_indirect → direct) 4. crate::optimize::constant_folding (propagate EXIT_OK discriminant) 5. crate::optimize::eliminate_dead_code (drop unreachable slow-path arms) 5.5 crate::optimize::forward_global_shim (NEW — task.return forward, lib.rs) 6. crate::optimize::eliminate_dead_stores (kill start_task waitable init) Each constituent pass is no-op on functions that don't need it, so over-applying is safe. Every pass already carries its own Z3 `verify_or_revert` gate, so the orchestrator does NOT bypass verification — it just composes existing proven passes in the right order for the post-detection IR. Wiring in `optimize_core_module` mirrors the Phase 3 / Phase 4 save-and-revert pattern: encode + wasm-tools-validate after the chain; revert on mismatch. ## Part B: forward_global_shim peephole (lib.rs) New `optimize::forward_global_shim(module)` recognises global.set $g global.get $g ;; immediately follows and erases both. Soundness: - the global has EXACTLY ONE writer module-wide (verified via a pre-scan that recurses into Block/Loop/If bodies), - `global.get` immediately follows the matching `global.set` (any intervening instruction blocks the fold), - any function containing Instruction::Unknown disqualifies the module-wide pass. The fold is byte-for-byte equivalent to keeping the value on the stack when the global has a single writer — no observer can race the round-trip. ## Tests (8 new, all pass) forward_global_shim: - test_forward_global_shim_folds_simple_pair - test_forward_global_shim_skips_multiple_writers - test_forward_global_shim_skips_intervening_op - test_forward_global_shim_skips_mismatched_indices - test_forward_global_shim_no_op_on_plain_function - test_forward_global_shim_skips_unknown_instructions chain composition: - test_chain_compose_eliminates_full_adapter - test_chain_no_op_when_pattern_absent Targeted run (async_adapter forward_global chain_compose): 12 passed; 0 failed; 0 ignored. ## Scope The chain re-uses existing passes — no duplication, no Z3 bypass. Token cost: +601 LOC (orchestrator + peephole + tests + comments). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds Phase 4 to the component pipeline. Detects the meld P3 async-callback adapter shape and folds the discriminant test + slow-path branch when EXIT_OK is statically true. 4 new tests + ~460 LOC.
Soundness via three guards: no Unknown in module, local-read-count = 1, I32Const == 0. Per-fold encode + wasm-tools validate with revert on mismatch (mirrors PR-M).
🤖 Generated with Claude Code