D2.1 token-agreement harness scaffold (I11 cert gate infra, 117/117 tests) by AdaWorldAPI · Pull Request #236 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-04-21T00:03:31Z

Summary

First Phase 2 deliverable — token-agreement harness scaffold. The I11 cert gate infrastructure lands with a machine-checkable stub: true wall to prevent the #219 → #220 failure mode (synthetic numbers read as real measurements).

117/117 cognitive-shader-driver --features serve tests pass (+13 new).

What lands

crates/cognitive-shader-driver/src/token_agreement.rs — ~320 LOC:

`ReferenceModel`

pub struct ReferenceModel { path, path_hash, stub_token_count }

impl ReferenceModel {
    pub fn load(path: &Path) -> Result<Self, TokenAgreementError>;   // D2.1 stub
    pub fn stub(tag: u64, n_tokens: u32) -> Self;                     // testing fixture
}

D2.1 load() validates path existence + hashes the display; D2.2 replaces with real safetensors parsing + tokenizer + runtime decoder, driven by auto_detect::detect() (D0.5).

`TopKAgreement` comparator

pub struct TopKAgreement { top1_matches, top5_matches, total_positions, divergence_positions }

impl TopKAgreement {
    pub fn compare(reference_topk: &[Vec<u32>], candidate_topk: &[Vec<u32>]) -> Result<Self>;
    pub fn top1_rate() / top5_rate() -> f32;
    pub fn meets_cert_gate() -> bool;    // top1 ≥ 0.99 AND top5 ≥ 0.999
    pub fn aggregate(per_prompt: &[Self]) -> Self;
}

Position-by-position comparison. Records divergence positions for failure-mode analysis ("late-sequence drift" vs "random errors everywhere"). Aggregation concatenates per-prompt divergences with offsets so failures stay localisable.

`TokenAgreementHarness`

pub struct TokenAgreementHarness { reference, baseline, candidate, n_tokens }

impl TokenAgreementHarness {
    pub fn measure_stub() -> Result<WireTokenAgreementResult>;        // D2.1: stub:true
    pub fn measure_full() -> Result<WireTokenAgreementResult>;        // D2.2: NotImplementedYet
}

measure_stub() returns stub: true, backend: "stub", top1_rate: 0.0, top5_rate: 0.0. The stub flag is machine-checkable per D0.2's anti-#219 pattern — clients assert !result.stub to fail loudly if they mistake stub output for real measurements.

Typed errors

pub enum TokenAgreementError {
    ModelPathMissing { path },
    EmptyPromptSet,
    TokenCountMismatch { reference, candidate },
    NotImplementedYet { what },    // points at D2.2 scope
}

Tests (13 new)

Critical coverage:

topk_compare_identical_streams_is_perfect — full cert gate pass
topk_top5_matches_when_top1_misses_but_in_top5 — top-5 logic verified on ref[0] = 7 appearing at position 3 in candidate top-5
topk_aggregate_sums_counters_and_offsets_divergence — prompt 2's divergence at position 4 becomes aggregate position 14 after prompt 1's 10 positions
cert_gate_passes_at_exact_thresholds — 990/1000 = 0.99 AND 999/1000 = 0.999 (exact boundary pass)
cert_gate_fails_when_top1_below_threshold_even_if_top5_passes — AND-gate semantics
cert_gate_fails_when_top5_below_threshold_even_if_top1_passes — AND-gate semantics
harness_measure_stub_returns_machine_checkable_stub_flag — enforces stub == true, backend == "stub", zero rates + latencies
harness_measure_full_returns_not_implemented_pointing_at_d22 — D2.2 scope pointer preserved
harness_measure_stub_rejects_zero_n_tokens — EmptyPromptSet typed error

Phase state after merge

Phase	Status
Phase 0 (Wire surface)	✅ Complete (D0.1–D0.7 all shipped)
Phase 1 scaffold	✅ D1.1 / D1.2 / D1.3 shipped
Phase 1 D1.1b (Cranelift wiring)	⏳ Queued
Phase 2	⏳ D2.1 this PR · D2.2 + D2.3 queued
Phase 3-5	⏳ Queued

Rules honored

Rule D — measurement set configured via WireTokenAgreement DTO (D0.2 surface)
Rule E — TopKAgreement exposes methods (top1_rate, meets_cert_gate, aggregate) not raw field access
Rule F — no serialisation between stages; per-prompt Vec<Vec<u32>> token streams are owned Rust values

Board hygiene

STATUS_BOARD.md: D2.1 Queued → In PR

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

First Phase 2 deliverable — scaffold of the I11 cert gate harness. The PR #219 → #220 lesson landed as a typed-rejection wall: the stub result carries stub:true + backend:"stub" so no client can confuse Phase 0 stub output for a real measurement. crates/cognitive-shader-driver/src/token_agreement.rs (~320 LOC): ReferenceModel { path, path_hash, stub_token_count } ::load(&Path) -> Result<Self, TokenAgreementError> D2.1 stub: validates path exists, hashes display; does NOT parse safetensors yet. D2.2 replaces with real loader driven by auto_detect::detect() → ModelFingerprint. ::stub(tag, n_tokens) — builds stub model without touching fs TokenAgreementError: ModelPathMissing { path } EmptyPromptSet TokenCountMismatch { reference, candidate } NotImplementedYet { what } ← measure_full() until D2.2 TopKAgreement { top1_matches, top5_matches, total_positions, divergence_positions: Vec<u32> } ::compare(ref: &[Vec<u32>], cand: &[Vec<u32>]) -> Result<Self> Position-by-position: top1 = r[0] == c[0]; top5 = r[0] in c[..5]. Records divergence positions for failure-mode analysis (late-sequence drift vs random errors). ::top1_rate() / top5_rate() -> f32 ::meets_cert_gate() -> bool (top1 ≥ 0.99 AND top5 ≥ 0.999) ::aggregate(per_prompt) — sums counters; concatenates divergence with per-prompt offset so failures stay localised TokenAgreementHarness: ::new(reference, baseline, candidate, n_tokens) ::measure_stub() -> WireTokenAgreementResult { stub:true, .. } ::measure_full() -> NotImplementedYet (D2.2 scope) Tests (13 new): - reference_model_stub_builds_without_filesystem - reference_model_load_missing_path_yields_typed_error - topk_compare_identical_streams_is_perfect (full cert gate pass) - topk_compare_all_different_fails_cert_gate - topk_top5_matches_when_top1_misses_but_in_top5 (ref top-1 = 7; cand has 7 at position 3 in top-5 → top5 counts) - topk_mismatched_stream_lengths_yield_typed_error - topk_aggregate_sums_counters_and_offsets_divergence (prompt 2's divergence at pos 4 → aggregate pos 14 after prompt 1's 10) - cert_gate_passes_at_exact_thresholds (990/1000 = 0.99, 999/1000 = 0.999 — both boundaries hit) - cert_gate_fails_when_top1_below_threshold_even_if_top5_passes - cert_gate_fails_when_top5_below_threshold_even_if_top1_passes - harness_measure_stub_returns_machine_checkable_stub_flag (stub:true enforced; backend="stub"; all rates 0.0; zero latencies) - harness_measure_full_returns_not_implemented_pointing_at_d22 - harness_measure_stub_rejects_zero_n_tokens Board hygiene (CLAUDE.md Mandatory rule): STATUS_BOARD.md D2.1 Queued → In PR Phase state: Phase 0 ✅ complete (D0.1-D0.7 all shipped) Phase 1 scaffold ✅ (D1.1, D1.2, D1.3 shipped; D1.1b queued) Phase 2 ⏳ D2.1 (this PR), D2.2 + D2.3 queued Rules honored: Rule D — Measurement set comes from Wire DTOs (D0.2 WireTokenAgreement) Rule E — TopKAgreement exposes object-methods (top1_rate, meets_cert_gate) Rule F — No serialization between stages; per-prompt Vec<Vec<u32>> token streams are plain Rust owned; the serde happens at D2.3 handler entry / exit only https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

AdaWorldAPI merged commit 3ee739a into main Apr 21, 2026
0 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

D2.1 token-agreement harness scaffold (I11 cert gate infra, 117/117 tests)#236

D2.1 token-agreement harness scaffold (I11 cert gate infra, 117/117 tests)#236
AdaWorldAPI merged 1 commit into
mainfrom
claude/teleport-session-setup-wMZfb

AdaWorldAPI commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 21, 2026

Summary

What lands

ReferenceModel

TopKAgreement comparator

TokenAgreementHarness

Typed errors

Tests (13 new)

Phase state after merge

Rules honored

Board hygiene

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`ReferenceModel`

`TopKAgreement` comparator

`TokenAgreementHarness`