D0.5 auto_detect + D0.2 WireTokenAgreement stub (Phase 0, 66/66 tests)#231
Conversation
Two more Phase 0 deliverables from codec-sweep-via-lab-infra-v1.
66/66 cognitive-shader-driver tests pass under --features serve (+11 new).
D0.5 — auto_detect.rs (~300 LOC, CODING_PRACTICES gap 1):
Reads <model_path>/config.json (HuggingFace layout) and returns
ModelFingerprint { architecture, hidden_size, n_layers,
tokenizer_class, vocab_size, default_lane_width, default_distance }.
Architecture routing:
llama / qwen / qwen2 / qwen3 / mistral / mixtral → BF16x32 (AMX)
bert / modernbert / xlm-roberta / generic → F32x16 (AVX-512)
torch_dtype override wins over architecture heuristic.
Typed errors: ConfigMissing / Io / Parse / MissingField {path, field}.
Best-effort tokenizer_class from tokenizer_config.json.
8 tests: llama / qwen3-with-tokenizer / bert / modernbert / xlm-roberta
(d_model alias) / generic fallback / missing-config / missing-field.
D0.2 — WireTokenAgreement stub (~100 LOC, the I11 cert gate):
DTOs:
WireBaseline { Passthrough } — default, extensible
WireTokenAgreement { model_path, reference, candidate (WireCodecParams),
prompt_set_blob_id, n_tokens }
WireTokenAgreementResult { top1_rate, top5_rate,
divergence_positions, per_layer_mse,
candidate_latency_us, reference_latency_us,
stub, backend }
Phase 0 handler stub (not shipped yet): returns stub:true /
backend:"stub" deterministic result. Phase 2 D2.1-D2.3 land the
real decode-and-compare loop (reference model load + top-k
comparison + per-layer MSE).
Pass gates (for when the harness lands):
top1_rate ≥ 0.99 + top5_rate ≥ 0.999 vs Passthrough baseline.
This is the ACTUAL codec cert gate — reconstruction ICC is
necessary-but-not-sufficient (per #219/#220 lesson).
3 round-trip serde tests: full payload + stub-backend default +
baseline default.
Board hygiene (CLAUDE.md Mandatory rule):
STATUS_BOARD.md updated:
D0.1 Queued → Shipped (PR #227 — was stale)
D0.2 Queued → In PR (this branch)
D0.5 Queued → In PR (this branch)
Phase 0 state after this commit:
✅ D0.1 WireCalibrate + WireTensorView (PR #227)
✅ D0.6 CodecParamsBuilder (PR #225)
✅ D0.7 precision-ladder validation (PR #225)
✅ D0.5 auto_detect (this PR)
✅ D0.2 WireTokenAgreement stub (this PR)
⏳ D0.3 WireSweep streaming endpoint (next PR)
⏳ D0.4 surface freeze (gates after D0.3)
Rules honored:
Rule D — JSON/YAML/REST only, CodecParams carried through via WireCodecParams
Rule E — Wire surface IS the SIMD surface (lane_width on candidate)
Rule F — serde mirrors at ingress only; TryFrom → CodecParams at handler
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Two findings from the D0.2 + D0.5 implementation, landed per user directive "including the epiphanies board": 1. D0.2 stub flag is anti-#219 defense at the type level. WireTokenAgreementResult.stub:bool + backend:"stub" default make the "synthetic-rows-mistaken-for-real" failure machine-checkable, not just documented. Generalises: every Phase-N surface DTO that lands before its Phase-N+k harness should carry an explicit stub flag. 2. D0.5 auto_detect is the concrete Python↔Rust handshake mechanism. Same architecture→lane-width table in Rosetta v2 Python and Rust auto_detect.rs. E-MEMB-11 handshake moves from conceptual to implemented; the slice-layout reconciliation doc (E-MEMB-1 fix) can use the same pattern (architecture → layout version → canonical slice table). Both entries prepended per APPEND-ONLY rule. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f4304f5bef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| #[serde(default)] | ||
| pub stub: bool, |
There was a problem hiding this comment.
Default missing token-agreement stub marker to true
When WireTokenAgreementResult is deserialized from JSON that omits stub, Serde currently fills it with false, while backend defaults to "stub". That creates a contradictory payload (backend == "stub" but stub == false) and can let clients that gate on !stub treat Phase-0 placeholder metrics as real results. Since this field is intended to prevent exactly that confusion, the default should fail closed (or be required) rather than silently reporting non-stub.
Useful? React with 👍 / 👎.
| .or_else(|| cfg.architectures.as_ref().and_then(|a| a.first().cloned())) | ||
| .unwrap_or_else(|| "generic".to_string()) | ||
| .to_lowercase(); |
There was a problem hiding this comment.
Normalize
architectures fallback before family matching
The fallback path copies the first architectures entry verbatim (lowercased) into architecture, but lane-width routing only matches exact family names like "qwen3"/"llama". For configs that omit model_type and use class names such as Qwen3ForCausalLM or LlamaForCausalLM, this yields qwen3forcausallm/llamaforcausallm and incorrectly falls back to F32x16 instead of the BF16 path, so auto-detection returns the wrong default lane width.
Useful? React with 👍 / 👎.
Last Phase 0 Wire-surface deliverable from codec-sweep-via-lab-infra-v1.
71/71 cognitive-shader-driver tests pass under --features serve
(+5 new D0.3 tests).
DTOs (~250 LOC in wire.rs):
WireMeasure enum:
ReconstructionErrorHeldOut / ReconstructionIccHeldOut /
TokenAgreementTop1 / TokenAgreementTop5 / PerLayerMse
(serde: lowercase snake_case)
WireSweepGrid:
subspaces / centroids / residual_depths / rotations /
distances / lane_widths — each a Vec<T> with sensible defaults
(defaults produce cardinality 1 for minimal payloads)
+ residual_centroids / calibration_rows / measurement_rows / seed
Methods:
- cardinality() -> usize — product of axis lengths
- enumerate() -> Vec<WireCodecParams> — full Cartesian product
WireSweepRequest:
tensor_path / grid / measure (default: ICC + top-1) /
log_to_lance (optional Lance fragment path) / label
WireSweepResult (one per grid point):
grid_index / candidate / kernel_hash (CodecParams::kernel_signature) /
calibrate (Option<WireCalibrateResponse>) /
token_agreement (Option<WireTokenAgreementResult>) /
stub flag (mirrors WireTokenAgreementResult.stub)
WireSweepResponse (for non-streaming batch clients):
label / cardinality / results / elapsed_ms / lance_fragment_path
Streaming handler (SSE) + Lance writer deferred to Phase 3 D3.1.
Phase 0 ships the SURFACE; Phase 3 lands the execution.
Tests (5 new):
- sweep_grid_cardinality_is_product_of_axes (1×3×3×2×1×2 = 36)
- sweep_grid_enumerate_produces_all_unique_signatures
(4 distinct kernel signatures from 4 distinct IR-shaping tuples)
- sweep_grid_defaults_produce_single_candidate
(empty JSON {} → cardinality 1, single default WireCodecParams)
- sweep_request_round_trips_json (full payload with all fields)
- sweep_measure_serializes_snake_case (serde enum format)
Board hygiene (CLAUDE.md Mandatory rule):
STATUS_BOARD.md:
D0.3 Queued → In PR
D0.4 Queued → Ready (surface freeze fires on merge)
EPIPHANIES.md PREPEND:
"D0.3 sweep grid IS the JIT cache warmer" —
the grid and the cache signature are the same object viewed
from two sides. Each unique (subspaces, centroids,
residual_depth, rotation_kind, distance, lane_width) tuple maps
to exactly one kernel_signature(). First traversal compiles N
kernels; every subsequent sweep with overlapping tuples hits
cache at ~0 ms. 54-candidate Appendix A §30 sweep: ~800 ms
one-time compile, free after.
Phase 0 state after this PR (all 7 D0.x deliverables):
✅ D0.1 WireCalibrate + WireTensorView (PR #227)
✅ D0.2 WireTokenAgreement stub (PR #231)
✅ D0.3 WireSweep DTOs + grid (this PR)
⏳ D0.4 surface freeze (fires on merge)
✅ D0.5 auto_detect (PR #231)
✅ D0.6 CodecParamsBuilder (PR #225)
✅ D0.7 precision-ladder validation (PR #225)
Rules honored (every Wire DTO in this PR):
Rule D — JSON/YAML/REST only, never in-Rust construction at ingress
Rule E — Wire surface IS SIMD surface (lane_widths axis explicit,
kernel_hash returned per result)
Rule F — serde mirrors at ingress only; enumerate() returns plain
Rust objects that never re-serialize until egress
After this PR merges: D0.4 surface freeze → Phase 1 (JIT kernels) begins.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Summary
Two Phase 0 Wire-surface deliverables from the codec-sweep plan, plus board hygiene (STATUS_BOARD + EPIPHANIES) in the same PR per the Mandatory Board-Hygiene Rule.
66/66
cognitive-shader-driver --features servetests pass (+11 new).D0.5 —
auto_detect(CODING_PRACTICES gap 1 remediation)crates/cognitive-shader-driver/src/auto_detect.rs(~300 LOC, 8 tests).Reads
<model_path>/config.json(HuggingFace layout) and returns:Architecture routing (mirrors Rosetta v2's Python-side heuristic):
LaneWidthBF16x32(AMX-ready)F32x16(AVX-512 baseline)torch_dtype: "bfloat16"BF16x32(override wins)Typed errors:
ConfigMissing/Io/Parse/MissingField { field }.Tests: llama / qwen3-with-tokenizer / bert / modernbert (via
architecturesfallback) / xlm-roberta (viad_modelalias) / generic / missing-config / missing-hidden_size.D0.2 —
WireTokenAgreementstub (the I11 cert gate surface)crates/cognitive-shader-driver/src/wire.rs(~100 LOC, 3 tests).Ships the DTOs now; real decode-and-compare harness lands in Phase 2 D2.1–D2.3.
Pass gates when the harness lands (D2.1–D2.3):
top1_rate ≥ 0.99top5_rate ≥ 0.999This is the actual codec cert gate. Reconstruction ICC is necessary-but-not-sufficient (#219/#220 lesson).
Board hygiene in same commit
STATUS_BOARD.md— D0.1 Queued → Shipped (was stale, epiphanies: E-ORIG-1..7 + E-MEMB-1..13 + AGI Grundlagenforschung synthesis #227 merged it); D0.2 + D0.5 Queued → In PR.EPIPHANIES.md— two PREPEND entries:stub:true/backend:"stub"makes "synthetic-rows-mistaken-for-real" machine-checkable, not just documented. Pattern generalises: every Phase-N DTO that lands before its Phase-N+k harness should carry an explicit stub flag.auto_detect.rs(Rust). E-MEMB-11 handshake moves from conceptual to implemented.Phase 0 state after this PR
WireCalibrate+WireTensorViewWireTokenAgreementstubWireSweepstreaming endpointauto_detectCodecParamsBuilderOne more Phase 0 deliverable (D0.3
WireSweep) to freeze the Wire surface.Test Plan
cargo test -p lance-graph-contract --lib— 147/147 pass (unchanged)cargo test --manifest-path crates/cognitive-shader-driver/Cargo.toml --features serve— 66/66 pass (+11 new)cargo test --manifest-path crates/jc/Cargo.toml— 6/6 pass (JC substrate proof unchanged)https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh