feat(opt): #68 Tier-1.1 scalar adapter inlining + Tier-2.2 body dedup (v1.0.5 Track 3)#132
Merged
Merged
Conversation
… (v1.0.5 Track 3)
Adds two new passes to loom-core/src/fused_optimizer.rs, slotted between
devirtualize_adapters and eliminate_dead_functions.
## Pass 1.1 — inline_scalar_adapters (Tier-1.1)
Filters AdapterInfo list to scalar (i32/i64/f32/f64) adapters whose body
matches one of two canonical shapes:
- Pure pass-through: 'local.get 0; ...; local.get N-1; call target;
end' with no memory ops. Already in the desired shape — no-op rewrite
but counted so the optimisation surface is visible.
- Cross-memory scalar copy: load(callee_mem) + load(caller_mem) +
call target + store(caller_mem) + store(callee_mem), where the two
memory indices differ. Rewritten to the pass-through shape so the
standard inline_functions pass picks them up downstream.
Anything else is skipped (Unrecognised). Start function is also skipped.
## Pass 1.2 — dedupe_function_bodies (Tier-2.2)
Hashes each local function's (signature, locals, instructions) tuple
using DefaultHasher over Debug-format strings (Instruction derives
PartialEq but not Hash, so we use a serialisable representation).
Groups by hash, verifies byte-equality on multi-entry groups (defends
against hash collisions), and redirects all calls to the duplicates to
a single representative (lowest function index) via the existing
rewrite_calls helper. Redirected functions become dead and are removed
by the subsequent eliminate_dead_functions pass.
Conservative skips:
- instructions.len() <= 4 (overhead exceeds savings on tiny bodies)
- Exported functions (observable name on public surface)
- Start function (same)
## Wiring
Both passes are added as Pass 1.1 and Pass 1.2 in optimize_fused_module.
New stats fields 'scalar_adapters_inlined' and
'function_bodies_deduplicated' surface counts.
## Correctness
Both passes are scoped narrowly enough to be obviously sound:
- Tier-1.1: pass-through and cross-memory shapes are both
semantically equivalent to a direct call to the underlying target.
Pass-through is identity; cross-memory relies on callee-memory
loads/stores being redundant once the underlying function is called
directly with the caller's scalar value.
- Tier-2.2: redirecting a call site from func A to func B is sound
when A and B have byte-identical (signature, locals, instructions).
## Tests (6 new, all passing)
- test_inline_scalar_adapter_pass_through
- test_inline_scalar_adapter_skips_non_scalar (Unrecognised body shape)
- test_dedupe_finds_identical_bodies
- test_dedupe_skips_exported_function
- test_dedupe_skips_tiny_bodies (<=4 instr threshold)
- test_inline_and_dedupe_via_full_pipeline (end-to-end through
optimize_fused_module, checks both stats counters fire)
Full fused_optimizer test suite: 64 passed / 0 failed.
## Measurement (loom optimize --attestation false)
gale_in_baseline.wasm: 1941 -> 1846 (-95B, -4.9%)
httparse.wasm: 4766 -> 4668 (-98B, -2.1%)
json_lite.wasm: 3510 -> 3377 (-133B, -3.8%)
state_machine.wasm: 1655 -> 1558 (-97B, -5.9%)
calc.component.wasm: 442 -> 392 (-50B, -11.3%)
All deltas come from the broader optimization pipeline. The new
fused-side passes are byte-neutral on non-fused inputs (as designed)
but unblock further wins on real meld-fused multi-component modules
once those are added to the corpus.
Trace: REQ-3, REQ-11
Refs: #68
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Merged
avrabe
added a commit
that referenced
this pull request
May 19, 2026
…#68 fused + Rocq prep) (#133) Four v1.0.4 infrastructure pieces grew real consumers: #130 Track 1+4 ægraph pipeline integration + Rocq Roundtrip prep doc #131 Track 2 #70 six-pass chain composition + task.return shim peephole #132 Track 3 #68 Tier-1.1 scalar adapter inlining + Tier-2.2 body dedup +18 tests, 400+ total. Code-section bytes unchanged on the current corpus — Track 3 agent flagged 4 pre-existing observations worth tracking for v1.0.6+.
avrabe
added a commit
that referenced
this pull request
May 19, 2026
….1.0 Track E) (#136) Adds the `meld_fused` workload entry to `scripts/measure_corpus.sh` and a `tests/corpus/MELD_FUSED_README.md` placeholder documenting exactly what the v1.0.5 Track 3 byte-wins surface needs: a real `meld fuse` of two **dependent** components (one importing the other) so the fused core contains a cross-memory scalar adapter and at least one dedup target. This is a doc-only PR because the authoring agent could not invoke `meld` (environment permission restriction). Per CLAUDE.md "Honesty > shipping something fake": the harness now lists the workload (it reports `n/a` until the fixture lands), and the README spells out the toolchain, input requirements, and verification commands so the next operator can complete the work in minutes. The existing single-component fixtures (`simple.component.wasm`, `calc.component.wasm`) are independent (neither imports the other) and therefore cannot produce the cross-memory adapter shape Track 3 Tier-1.1 targets — that's why Track 3 fired byte-neutral on the v1.0.5 corpus. Implements: TASK-corpus-meld-fixture Refs: #132 (Track 3 passes), 0424b7e (component-aware harness) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extends fused_optimizer.rs with two passes. ~510 LOC + 6 tests, all pass. Byte-neutral on current corpus (no real meld-fused fixture yet); infrastructure for #68's broader cross-component story.
🤖 Generated with Claude Code