feat(opt): island-model parallel optimization (v1.0.4 #71, Track D)#128
Merged
Conversation
Implements #71: run multiple pass orderings concurrently and pick the smallest verified result. The v0.6.0 -> v0.7.0 gale CSE-cost regression (commit afc9318) is exactly the failure mode this guards against - different orderings produce different sizes, the smaller verified result wins, and the regression cannot ship. New module `loom-core/src/islands.rs` with `IslandConfig` and `optimize_module_islands(module, configs) -> Result<Module>`. Ships 4 default configs (baseline, inline-late, cse-early, aggressive-inline). Each island clones the input, runs its pass sequence (every pass still invokes its per-function Z3 + stack `verify_or_revert` gate internally), encodes the result, validates via `wasmparser::validate`, and is selected if it produces the smallest size. Tie-break is lex order on island name for deterministic results. Parallelism via `rayon::scope`. Soundness: Z3's context is thread-local (z3 crate's `with_z3_config` sets a per-thread context), so each rayon worker creates its own Z3 state when verification passes run inside the optimization passes themselves. No shared mutable Z3 state across islands. CLI: new `--islands N` flag (default 1 preserves current serial path). N>1 dispatches to the parallel harness. Per-island encoded sizes are emitted to stderr so users can see how each ordering shaped the output - this is exactly the diagnostic that would have caught the gale regression. Tests: 6 unit tests covering N=1 byte-identity with serial, smallest-wins selection, invalid-island rejection, all-failed error, deterministic tie-break across rayon thread interleavings, empty-configs error. Dependency: `rayon = "1"` added to `loom-core` only - the CLI does not take a direct rayon dep, the boundary stays tight. Measurement on gale_in_baseline.wasm: all 4 islands produce 1846 bytes (converged fixed point on this small fixture). N=1 takes 303ms, N=4 takes 430ms - parallelism is working (4x sequential work in ~1.4x wall time). On larger / less-converged modules the harness will surface the size deltas that would otherwise have shipped silently. Implements #71 Refs: docs/research/v1.0.3/issue-roadmap.md (#71 section)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New loom-core/src/islands.rs (~580 LOC) + CLI --islands N flag. Runs N IslandConfigs concurrently via rayon, each independently passing Z3 + stack validation. Picks min_by_key(encoded_size) with deterministic name lex tie-break.
Measurement on gale: all 4 default islands converge to 1846 bytes (a fixed-point after v0.7.0+ pipeline hardening — the regression-detection safety net is in place even when current pipelines don't diverge). N=4 takes 1.4× wall time for 4× serial work — rayon distribution confirmed across cores.
🤖 Generated with Claude Code