Skip to content

D0.5 auto_detect + D0.2 WireTokenAgreement stub (Phase 0, 66/66 tests)#231

Merged
AdaWorldAPI merged 2 commits into
mainfrom
claude/teleport-session-setup-wMZfb
Apr 20, 2026
Merged

D0.5 auto_detect + D0.2 WireTokenAgreement stub (Phase 0, 66/66 tests)#231
AdaWorldAPI merged 2 commits into
mainfrom
claude/teleport-session-setup-wMZfb

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Two Phase 0 Wire-surface deliverables from the codec-sweep plan, plus board hygiene (STATUS_BOARD + EPIPHANIES) in the same PR per the Mandatory Board-Hygiene Rule.

66/66 cognitive-shader-driver --features serve tests pass (+11 new).

D0.5 — auto_detect (CODING_PRACTICES gap 1 remediation)

crates/cognitive-shader-driver/src/auto_detect.rs (~300 LOC, 8 tests).

Reads <model_path>/config.json (HuggingFace layout) and returns:

pub struct ModelFingerprint {
    pub architecture: String,           // llama / qwen3 / bert / modernbert / ...
    pub hidden_size: u32,
    pub n_layers: u32,
    pub tokenizer_class: String,        // from tokenizer_config.json, best-effort
    pub vocab_size: u32,
    pub default_lane_width: LaneWidth,  // inferred per architecture + torch_dtype
    pub default_distance: Distance,
}

Architecture routing (mirrors Rosetta v2's Python-side heuristic):

Architecture family Default LaneWidth
llama / qwen / qwen2 / qwen3 / mistral / mixtral BF16x32 (AMX-ready)
bert / modernbert / xlm-roberta / generic F32x16 (AVX-512 baseline)
any, with explicit torch_dtype: "bfloat16" BF16x32 (override wins)

Typed errors: ConfigMissing / Io / Parse / MissingField { field }.

Tests: llama / qwen3-with-tokenizer / bert / modernbert (via architectures fallback) / xlm-roberta (via d_model alias) / generic / missing-config / missing-hidden_size.

D0.2 — WireTokenAgreement stub (the I11 cert gate surface)

crates/cognitive-shader-driver/src/wire.rs (~100 LOC, 3 tests).

Ships the DTOs now; real decode-and-compare harness lands in Phase 2 D2.1–D2.3.

pub enum WireBaseline { Passthrough }  // extensible

pub struct WireTokenAgreement {
    pub model_path: String,
    pub reference: WireBaseline,         // defaults Passthrough
    pub candidate: WireCodecParams,
    pub prompt_set_blob_id: u64,
    pub n_tokens: u32,
}

pub struct WireTokenAgreementResult {
    pub top1_rate: f32,
    pub top5_rate: f32,
    pub divergence_positions: Vec<u32>,
    pub per_layer_mse: Vec<f32>,
    pub candidate_latency_us: u64,
    pub reference_latency_us: u64,
    pub stub: bool,         // Phase 0 honesty flag
    pub backend: String,    // "amx" / "vnni" / "avx512" / "avx2" / "legacy" / "stub"
}

Pass gates when the harness lands (D2.1–D2.3):

  • top1_rate ≥ 0.99
  • top5_rate ≥ 0.999

This is the actual codec cert gate. Reconstruction ICC is necessary-but-not-sufficient (#219/#220 lesson).

Board hygiene in same commit

Phase 0 state after this PR

D-id Deliverable Status
D0.1 WireCalibrate + WireTensorView ✅ Shipped (#227)
D0.2 WireTokenAgreement stub ✅ This PR
D0.3 WireSweep streaming endpoint ⏳ Queued
D0.4 Surface freeze (rebuild) ⏳ Gates after D0.3
D0.5 auto_detect ✅ This PR
D0.6 CodecParamsBuilder ✅ Shipped (#225)
D0.7 Precision-ladder validation ✅ Shipped (#225)

One more Phase 0 deliverable (D0.3 WireSweep) to freeze the Wire surface.

Test Plan

  • cargo test -p lance-graph-contract --lib — 147/147 pass (unchanged)
  • cargo test --manifest-path crates/cognitive-shader-driver/Cargo.toml --features serve — 66/66 pass (+11 new)
  • cargo test --manifest-path crates/jc/Cargo.toml — 6/6 pass (JC substrate proof unchanged)
  • Rules A-F honored: D + E + F apply (D0.5/D0.2 are DTO + serde + one-edge decode)
  • Board-hygiene rule: STATUS_BOARD + EPIPHANIES updated in the same scope

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

claude added 2 commits April 20, 2026 22:56
Two more Phase 0 deliverables from codec-sweep-via-lab-infra-v1.
66/66 cognitive-shader-driver tests pass under --features serve (+11 new).

D0.5 — auto_detect.rs (~300 LOC, CODING_PRACTICES gap 1):
  Reads <model_path>/config.json (HuggingFace layout) and returns
  ModelFingerprint { architecture, hidden_size, n_layers,
  tokenizer_class, vocab_size, default_lane_width, default_distance }.

  Architecture routing:
    llama / qwen / qwen2 / qwen3 / mistral / mixtral → BF16x32 (AMX)
    bert / modernbert / xlm-roberta / generic → F32x16 (AVX-512)
  torch_dtype override wins over architecture heuristic.

  Typed errors: ConfigMissing / Io / Parse / MissingField {path, field}.
  Best-effort tokenizer_class from tokenizer_config.json.

  8 tests: llama / qwen3-with-tokenizer / bert / modernbert / xlm-roberta
  (d_model alias) / generic fallback / missing-config / missing-field.

D0.2 — WireTokenAgreement stub (~100 LOC, the I11 cert gate):
  DTOs:
    WireBaseline { Passthrough } — default, extensible
    WireTokenAgreement { model_path, reference, candidate (WireCodecParams),
                          prompt_set_blob_id, n_tokens }
    WireTokenAgreementResult { top1_rate, top5_rate,
                                divergence_positions, per_layer_mse,
                                candidate_latency_us, reference_latency_us,
                                stub, backend }

  Phase 0 handler stub (not shipped yet): returns stub:true /
  backend:"stub" deterministic result. Phase 2 D2.1-D2.3 land the
  real decode-and-compare loop (reference model load + top-k
  comparison + per-layer MSE).

  Pass gates (for when the harness lands):
    top1_rate ≥ 0.99 + top5_rate ≥ 0.999 vs Passthrough baseline.
    This is the ACTUAL codec cert gate — reconstruction ICC is
    necessary-but-not-sufficient (per #219/#220 lesson).

  3 round-trip serde tests: full payload + stub-backend default +
  baseline default.

Board hygiene (CLAUDE.md Mandatory rule):
  STATUS_BOARD.md updated:
    D0.1 Queued → Shipped (PR #227 — was stale)
    D0.2 Queued → In PR (this branch)
    D0.5 Queued → In PR (this branch)

Phase 0 state after this commit:
  ✅ D0.1 WireCalibrate + WireTensorView (PR #227)
  ✅ D0.6 CodecParamsBuilder (PR #225)
  ✅ D0.7 precision-ladder validation (PR #225)
  ✅ D0.5 auto_detect (this PR)
  ✅ D0.2 WireTokenAgreement stub (this PR)
  ⏳ D0.3 WireSweep streaming endpoint (next PR)
  ⏳ D0.4 surface freeze (gates after D0.3)

Rules honored:
  Rule D — JSON/YAML/REST only, CodecParams carried through via WireCodecParams
  Rule E — Wire surface IS the SIMD surface (lane_width on candidate)
  Rule F — serde mirrors at ingress only; TryFrom → CodecParams at handler

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Two findings from the D0.2 + D0.5 implementation, landed per user
directive "including the epiphanies board":

1. D0.2 stub flag is anti-#219 defense at the type level.
   WireTokenAgreementResult.stub:bool + backend:"stub" default make
   the "synthetic-rows-mistaken-for-real" failure machine-checkable,
   not just documented. Generalises: every Phase-N surface DTO that
   lands before its Phase-N+k harness should carry an explicit stub
   flag.

2. D0.5 auto_detect is the concrete Python↔Rust handshake mechanism.
   Same architecture→lane-width table in Rosetta v2 Python and Rust
   auto_detect.rs. E-MEMB-11 handshake moves from conceptual to
   implemented; the slice-layout reconciliation doc (E-MEMB-1 fix)
   can use the same pattern (architecture → layout version →
   canonical slice table).

Both entries prepended per APPEND-ONLY rule.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
@AdaWorldAPI AdaWorldAPI merged commit 0101412 into main Apr 20, 2026
0 of 6 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4304f5bef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +999 to +1000
#[serde(default)]
pub stub: bool,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Default missing token-agreement stub marker to true

When WireTokenAgreementResult is deserialized from JSON that omits stub, Serde currently fills it with false, while backend defaults to "stub". That creates a contradictory payload (backend == "stub" but stub == false) and can let clients that gate on !stub treat Phase-0 placeholder metrics as real results. Since this field is intended to prevent exactly that confusion, the default should fail closed (or be required) rather than silently reporting non-stub.

Useful? React with 👍 / 👎.

Comment on lines +128 to +130
.or_else(|| cfg.architectures.as_ref().and_then(|a| a.first().cloned()))
.unwrap_or_else(|| "generic".to_string())
.to_lowercase();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize architectures fallback before family matching

The fallback path copies the first architectures entry verbatim (lowercased) into architecture, but lane-width routing only matches exact family names like "qwen3"/"llama". For configs that omit model_type and use class names such as Qwen3ForCausalLM or LlamaForCausalLM, this yields qwen3forcausallm/llamaforcausallm and incorrectly falls back to F32x16 instead of the BF16 path, so auto-detection returns the wrong default lane width.

Useful? React with 👍 / 👎.

AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Last Phase 0 Wire-surface deliverable from codec-sweep-via-lab-infra-v1.
71/71 cognitive-shader-driver tests pass under --features serve
(+5 new D0.3 tests).

DTOs (~250 LOC in wire.rs):
  WireMeasure enum:
    ReconstructionErrorHeldOut / ReconstructionIccHeldOut /
    TokenAgreementTop1 / TokenAgreementTop5 / PerLayerMse
    (serde: lowercase snake_case)

  WireSweepGrid:
    subspaces / centroids / residual_depths / rotations /
    distances / lane_widths — each a Vec<T> with sensible defaults
    (defaults produce cardinality 1 for minimal payloads)
    + residual_centroids / calibration_rows / measurement_rows / seed
    Methods:
      - cardinality() -> usize — product of axis lengths
      - enumerate() -> Vec<WireCodecParams> — full Cartesian product

  WireSweepRequest:
    tensor_path / grid / measure (default: ICC + top-1) /
    log_to_lance (optional Lance fragment path) / label

  WireSweepResult (one per grid point):
    grid_index / candidate / kernel_hash (CodecParams::kernel_signature) /
    calibrate (Option<WireCalibrateResponse>) /
    token_agreement (Option<WireTokenAgreementResult>) /
    stub flag (mirrors WireTokenAgreementResult.stub)

  WireSweepResponse (for non-streaming batch clients):
    label / cardinality / results / elapsed_ms / lance_fragment_path

Streaming handler (SSE) + Lance writer deferred to Phase 3 D3.1.
Phase 0 ships the SURFACE; Phase 3 lands the execution.

Tests (5 new):
  - sweep_grid_cardinality_is_product_of_axes (1×3×3×2×1×2 = 36)
  - sweep_grid_enumerate_produces_all_unique_signatures
    (4 distinct kernel signatures from 4 distinct IR-shaping tuples)
  - sweep_grid_defaults_produce_single_candidate
    (empty JSON {} → cardinality 1, single default WireCodecParams)
  - sweep_request_round_trips_json (full payload with all fields)
  - sweep_measure_serializes_snake_case (serde enum format)

Board hygiene (CLAUDE.md Mandatory rule):
  STATUS_BOARD.md:
    D0.3 Queued → In PR
    D0.4 Queued → Ready (surface freeze fires on merge)

  EPIPHANIES.md PREPEND:
    "D0.3 sweep grid IS the JIT cache warmer" —
    the grid and the cache signature are the same object viewed
    from two sides. Each unique (subspaces, centroids,
    residual_depth, rotation_kind, distance, lane_width) tuple maps
    to exactly one kernel_signature(). First traversal compiles N
    kernels; every subsequent sweep with overlapping tuples hits
    cache at ~0 ms. 54-candidate Appendix A §30 sweep: ~800 ms
    one-time compile, free after.

Phase 0 state after this PR (all 7 D0.x deliverables):
  ✅ D0.1 WireCalibrate + WireTensorView (PR #227)
  ✅ D0.2 WireTokenAgreement stub (PR #231)
  ✅ D0.3 WireSweep DTOs + grid (this PR)
  ⏳ D0.4 surface freeze (fires on merge)
  ✅ D0.5 auto_detect (PR #231)
  ✅ D0.6 CodecParamsBuilder (PR #225)
  ✅ D0.7 precision-ladder validation (PR #225)

Rules honored (every Wire DTO in this PR):
  Rule D — JSON/YAML/REST only, never in-Rust construction at ingress
  Rule E — Wire surface IS SIMD surface (lane_widths axis explicit,
           kernel_hash returned per result)
  Rule F — serde mirrors at ingress only; enumerate() returns plain
           Rust objects that never re-serialize until egress

After this PR merges: D0.4 surface freeze → Phase 1 (JIT kernels) begins.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants