Skip to content

feat: add Belichtungsmesser HDR popcount-stacking early-exit cascade#8

Merged
AdaWorldAPI merged 7 commits into
mainfrom
claude/setup-adaworld-repos-4kPEX
Mar 14, 2026
Merged

feat: add Belichtungsmesser HDR popcount-stacking early-exit cascade#8
AdaWorldAPI merged 7 commits into
mainfrom
claude/setup-adaworld-repos-4kPEX

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Self-calibrating integer-only Hamming distance cascade that eliminates
94%+ of candidates using sampled bit comparisons (1/16 → 1/4 → full).

Key components:

  • isqrt: integer Newton's method, no float
  • Band classification: Foveal/Near/Good/Weak/Reject sigma bands
  • Cascade query: sampling-aware thresholds (μ-4σ for 1/16, μ-2σ for 1/4)
  • Welford's online shift detection with integer arithmetic
  • 7 passing tests with timing/ns measurements

CI output (16384-bit vectors, 10K random candidates):
Stage 1: 83% rejected, Stage 2: 94% combined rejection
Brute force: 1784 ns/candidate, Cascade: 455 ns/candidate → 3.9x speedup
Work savings: 83% fewer word-ops

https://claude.ai/code/session_01Mcj8GxEtzmVba6RmuT7AjD

claude added 3 commits March 14, 2026 08:37
…terministic chain traversal

Three correctness fixes flagged in PR review:

1. HammingMin/SimilarityMax semirings now produce Vector(XOR) instead of
   Float(distance). The distance is a separate u32 computed by the caller
   via popcount. This eliminates the mxv silent-drop bug — all semiring
   outputs are now Vector and flow through mxv/mxm naturally.

2. SSSP rewritten as proper Bellman-Ford with cumulative u32 path costs
   tracked alongside XOR-composed path vectors. Edge weight = popcount of
   edge BitVec. Costs stored in GrBVector scalar side-channel. The old
   code compared popcount-to-zero (bit density) which is not path cost.

3. Chain traversal tie-breaking in SpoStore::walk_chain_forward is now
   deterministic: when two candidates have equal Hamming distance, the
   smallest key wins (instead of depending on HashMap iteration order).

Additional: GrBVector gains a scalar side-channel (set_scalar/get_scalar)
for algorithms that need to annotate vector entries with numeric metadata.
MonoidOp::MinPopcount added for min-Hamming-weight accumulation.

All 430 tests pass. Clippy clean.

https://claude.ai/code/session_01Mcj8GxEtzmVba6RmuT7AjD
Three benchmark tests that prove the core claims with numbers:

1. float_vs_hamming_sssp_equivalence — 100% pairwise ranking agreement
   between float Bellman-Ford and Hamming SSSP on a 1000-node random
   graph (490K comparisons). Prints speedup ratio.

2. belichtungsmesser_rejection_rate — 3-stage Hamming sampling cascade
   rejects 99.7% at stage 1 (1/16 sample), saves 93.5% compute vs
   full scan. 20 planted near-vectors all survive to stage 3.

3. float_cosine_vs_bf16_hamming_ranking — SimHash encoding preserves
   8/10 top-k results vs float cosine similarity on 1000 128-dim
   vectors (16384-bit SimHash, well above the 7/10 threshold).

These run in CI on every commit. The numbers do the selling.

https://claude.ai/code/session_01Mcj8GxEtzmVba6RmuT7AjD
Self-calibrating integer-only Hamming distance cascade that eliminates
94%+ of candidates using sampled bit comparisons (1/16 → 1/4 → full).

Key components:
- isqrt: integer Newton's method, no float
- Band classification: Foveal/Near/Good/Weak/Reject sigma bands
- Cascade query: sampling-aware thresholds (μ-4σ for 1/16, μ-2σ for 1/4)
- Welford's online shift detection with integer arithmetic
- 7 passing tests with timing/ns measurements

CI output (16384-bit vectors, 10K random candidates):
  Stage 1: 83% rejected, Stage 2: 94% combined rejection
  Brute force: 1784 ns/candidate, Cascade: 455 ns/candidate → 3.9x speedup
  Work savings: 83% fewer word-ops

https://claude.ai/code/session_01Mcj8GxEtzmVba6RmuT7AjD
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: caba4eacd9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +341 to +342
if foveal.len() >= top_k {
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove order-dependent early break from cascade_query

Breaking as soon as foveal.len() >= top_k makes the returned top-k depend on input iteration order instead of global distance, because later candidates are never examined even if they are closer. In any caller that passes unsorted candidates (the API does not require pre-sorting), this can return incorrect nearest-neighbor results and unstable rankings across runs.

Useful? React with 👍 / 👎.

Comment on lines +96 to 99
HdrSemiring::HammingMin | HdrSemiring::SimilarityMax => match (a, b) {
(HdrScalar::Vector(va), HdrScalar::Vector(vb)) => {
HdrScalar::Float(va.hamming_distance(vb) as f32)
}
_ => HdrScalar::Empty,
},
HdrSemiring::SimilarityMax => match (a, b) {
(HdrScalar::Vector(va), HdrScalar::Vector(vb)) => {
let dist = va.hamming_distance(vb);
HdrScalar::Float(hamming_to_similarity(dist))
HdrScalar::Vector(va.xor(vb))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve exact matches when switching to XOR-based SimilarityMax

Changing HammingMin/SimilarityMax multiplication to va.xor(vb) makes perfect matches evaluate to the zero vector, but GrBMatrix::mxv/mxm currently discard zero vectors (if !v.is_zero()). As a result, exact matches are silently dropped from outputs when these semirings are used, which can remove the best candidate entirely from similarity and path computations.

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI force-pushed the claude/setup-adaworld-repos-4kPEX branch from caba4ea to 2f34f92 Compare March 14, 2026 08:43
claude added 4 commits March 14, 2026 08:51
…sted)

The cascade thresholds must match the confidence level the Stichprobe
(sample size) can actually support:

  Stage 1 (1/16 sample, 1024 bits): bands[2] = μ-σ  → 1σ confidence
  Stage 2 (1/4 sample, 4096 bits):  bands[1] = μ-2σ → 2σ confidence
  Stage 3 (full, 16384 bits):       exact classification into all bands

The previous μ-4σ threshold with a 1/16 sample claimed a confidence
level the sample size cannot deliver — 4σ requires a much larger
Stichprobe. With only 16 words of data, the top-k survivors were
random candidates that got lucky on sampling noise, not real matches.

Removed cascade_s1/cascade_s2 fields. Cascade now uses bands[] directly,
matching the design doc exactly.

https://claude.ai/code/session_01Mcj8GxEtzmVba6RmuT7AjD
The cascade now precomputes thresholds at [1σ, 1.5σ, 2σ, 2.5σ, 3σ]
from calibrated (warmup) σ. Stage 1 and stage 2 select from this
table via stage1_level/stage2_level, allowing dynamic tightening
as σ stabilises from observed data.

cascade_at(quarter_sigmas) provides arbitrary quarter-sigma
granularity (1.75σ, 2.25σ, 2.75σ) for fine-grained adjustment.

The σ confidence must match what the Stichprobe supports:
  1/16 sample → 1σ (stage1_level=0)
  1/4 sample  → 2σ (stage2_level=2)
  full        → exact classification

After warmup (calibrate), thresholds reflect observed σ.
After shift detection (recalibrate), cascade table updates
while stage level selections are preserved.

8 tests (added test_cascade_warmup_and_levels).

https://claude.ai/code/session_01Mcj8GxEtzmVba6RmuT7AjD
Cascade table now has 8 entries at quarter-sigma intervals:
  [μ-1σ, μ-1.5σ, μ-1.75σ, μ-2σ, μ-2.25σ, μ-2.5σ, μ-2.75σ, μ-3σ]

New test_warmup_2k_then_shift_10k:
- Phase 1: Warmup with 2016 pairwise distances, sweep all 8 cascade
  levels showing rejection rate at each (59%→76% with theoretical σ)
- Phase 2: Feed 10000 observations from shifted distribution (μ→7800),
  Welford detects shift, recalibrate, re-sweep showing the warmed-up
  cascade achieving 95.7%→100% rejection across levels

The warmup is what makes the cascade work. Before calibration,
theoretical σ produces mediocre rejection. After warmup, the
confidence intervals are backed by observed data and the cascade
eliminates 95%+ at 1σ alone.

9 tests, all passing.

https://claude.ai/code/session_01Mcj8GxEtzmVba6RmuT7AjD
Print one-sided normal distribution expected rejection rates alongside
actual rates at each cascade level. Makes the Stichprobe confidence
gap visible:

  Pre-warmup (1/16 sample, σ=64): 59-76% actual vs 84-99.9% expected
  Post-shift (1/16 sample, σ=199): 95-100% actual vs 84-99.9% expected

The post-shift over-rejection reveals the normal distribution assumption
breaks when Welford's σ is inflated from mixing two distributions.

https://claude.ai/code/session_01Mcj8GxEtzmVba6RmuT7AjD
@AdaWorldAPI AdaWorldAPI merged commit 2d07da4 into main Mar 14, 2026
AdaWorldAPI added a commit that referenced this pull request May 6, 2026
… MCP scope)

Same-day follow-up to MedCareV2#7 in the LanceProbe arc (#4#5 → R2-R6 → #7#8). Cannot fetch diff — MedCareV2 is outside the MCP allowlist; placeholder
entry preserves the trail. Two same-day MedCareV2 PRs are kept as separate
entries (do not collapse) per the appended note in the file.

If/when the diff is paste-shared or allowlist extended, promote Confidence from
"Cannot verify" to FINDING with concrete delta.
AdaWorldAPI added a commit that referenced this pull request May 6, 2026
…ITICAL fixes required)

Meta-1 review surfaces 10 findings; 2 CRITICAL fixes block Round 2 opening:

CRITICAL #1: Doctor.Anamnese Full predicate-write violates BMV-Ä §57 append-only
  → fix: empty writable_predicates, keep only "append" action
CRITICAL #2: Receptionist clinical-blind fails safety (no Identity-read for
  allergy/triage lookup before scheduling)
  → fix: merge Patient permission to Detail-depth + 3 demographic writes,
    add Identity-read on Diagnosis + LabResult

HIGH #3-#4 (defer to Round 3 gate.rs): Diagnosis finalize/retract Escalate +
  Patient anonymize/merge/delete Escalate (GDPR Art.17 + §35 BDSG)
MEDIUM #5-#8 (backlog): Missing entities (Termin, Recall, ePA) + audit trail hook
LOW #9-#10 (backlog): PKV/GKV modulation + dynamic reason strings

Round 2 implications surfaced for W5/W8.
Round 3 implications surfaced for W9/W12 (Escalate wrapping + §73 SGB V test).

Concrete diff for W3-revision-2 included at end of file.
Next commit: W3-revision-2 applies the two CRITICAL fixes.
AdaWorldAPI pushed a commit that referenced this pull request May 6, 2026
…r VSA-scope correction

pattern.md (NEW, 578 lines) — usability patterns for traversing the
workspace's SoA/DTO surface as a graph (nodes = type defs, edges =
producer-consumer + duplicate + seam, subgraphs = clusters). 15 named
patterns covering canonical lookup, maturity scoring, Click-P-1 lens,
register-laziness check, dual-tier writes, ingestion-commit, lineage-
as-column, append-only governance, consult-first ordering, cross-
session blackboard via ledger row IDs, source-vs-claim divergence,
cluster-fix discipline, debug-as-API debt, scope-lock, and seam-naming.
Plus 7 critical findings and append-only update protocol.

ARCHITECTURE_ENTROPY_LEDGER.md (APPEND-only correction block) —

VSA scope correction per CLAUDE.md I-VSA-IDENTITIES iron rule:
- VSA-1 description tightened: Vsa16kF32 is for Markov chain over
  identity fingerprints exclusively. Provenance / JWT / RBAC / IDs
  are register territory, not VSA carriers.
- PERMUTE-1 description tightened: vsa_permute is unitary as an
  operation but the braiding usage is NOT lossless; cross-talk
  shrinks unbinding margin with N. Bound: N <= sqrt(d)/4.

New rows:
- EWA-SANDWICH-1 (PR #289 was missing from initial snapshot):
  Stage 3 / Smart / entropy 2. Scope: SPD-bounded propagation of
  cognitive Vsa16kF32 across Markov rho^d cycles. NOT a lineage
  error model (corrected from initial framing).
- SUBJECT-DTO-1: aspirational typed Subject struct with
  AuthSource enum (typed JwtClaims, not VSA). Implied by
  MedCareV2 #7+#8 wire shape.
- MOCK-DRIVER-1: q2 PR #35 Phase-3 placeholder, Stage 2.

Cross-repo resolution events:
- THINK-1 partial: q2 PR #35 dropped thinking-engine +
  cognitive-shader-driver deps from cockpit-server, migrated to
  canonical contract::cognitive_shader::*. Wire compression
  256x on cycle_fingerprint, 128x on color_acc.
- TRUTH-1 partial: q2 PR #35 cockpit-server bridges to
  lance-graph-planner::nars::truth::TruthValue::deduction.
- POLICY-1 + MEMBRANE-GATE-1: priority bump — MedCareV2 #8 now
  blocking on impl MembraneGate for Arc<rbac::Policy>.

Section G — ingestion-vs-traversal axis added: Cypher-parser path
(Option 1, ships now via PARSER-1 resolution) and splat-deposit path
(Option 2, gated by SPLAT-1) both converge on E1 typed Action API.

Retractions: E4 (VSA-bundled provenance) and E8 (geometrically-
bounded provenance via Vsa16kF32+EWA) — register laziness; do NOT
get appended to EPIPHANIES.md. E1, E2, E3, E5, E6, E7, E9 stand.

https://claude.ai/code/session_012AUf5NFgeAAQa5aQAKwSgx
AdaWorldAPI pushed a commit that referenced this pull request May 6, 2026
…r VSA-scope correction

pattern.md (NEW, 578 lines) — usability patterns for traversing the
workspace's SoA/DTO surface as a graph (nodes = type defs, edges =
producer-consumer + duplicate + seam, subgraphs = clusters). 15 named
patterns covering canonical lookup, maturity scoring, Click-P-1 lens,
register-laziness check, dual-tier writes, ingestion-commit, lineage-
as-column, append-only governance, consult-first ordering, cross-
session blackboard via ledger row IDs, source-vs-claim divergence,
cluster-fix discipline, debug-as-API debt, scope-lock, and seam-naming.
Plus 7 critical findings and append-only update protocol.

ARCHITECTURE_ENTROPY_LEDGER.md (APPEND-only correction block) —

VSA scope correction per CLAUDE.md I-VSA-IDENTITIES iron rule:
- VSA-1 description tightened: Vsa16kF32 is for Markov chain over
  identity fingerprints exclusively. Provenance / JWT / RBAC / IDs
  are register territory, not VSA carriers.
- PERMUTE-1 description tightened: vsa_permute is unitary as an
  operation but the braiding usage is NOT lossless; cross-talk
  shrinks unbinding margin with N. Bound: N <= sqrt(d)/4.

New rows:
- EWA-SANDWICH-1 (PR #289 was missing from initial snapshot):
  Stage 3 / Smart / entropy 2. Scope: SPD-bounded propagation of
  cognitive Vsa16kF32 across Markov rho^d cycles. NOT a lineage
  error model (corrected from initial framing).
- SUBJECT-DTO-1: aspirational typed Subject struct with
  AuthSource enum (typed JwtClaims, not VSA). Implied by
  MedCareV2 #7+#8 wire shape.
- MOCK-DRIVER-1: q2 PR #35 Phase-3 placeholder, Stage 2.

Cross-repo resolution events:
- THINK-1 partial: q2 PR #35 dropped thinking-engine +
  cognitive-shader-driver deps from cockpit-server, migrated to
  canonical contract::cognitive_shader::*. Wire compression
  256x on cycle_fingerprint, 128x on color_acc.
- TRUTH-1 partial: q2 PR #35 cockpit-server bridges to
  lance-graph-planner::nars::truth::TruthValue::deduction.
- POLICY-1 + MEMBRANE-GATE-1: priority bump — MedCareV2 #8 now
  blocking on impl MembraneGate for Arc<rbac::Policy>.

Section G — ingestion-vs-traversal axis added: Cypher-parser path
(Option 1, ships now via PARSER-1 resolution) and splat-deposit path
(Option 2, gated by SPLAT-1) both converge on E1 typed Action API.

Retractions: E4 (VSA-bundled provenance) and E8 (geometrically-
bounded provenance via Vsa16kF32+EWA) — register laziness; do NOT
get appended to EPIPHANIES.md. E1, E2, E3, E5, E6, E7, E9 stand.

https://claude.ai/code/session_012AUf5NFgeAAQa5aQAKwSgx
AdaWorldAPI added a commit that referenced this pull request May 13, 2026
…rect misdiagnosed hpc-extras issue

S9-W1 (FIX-1 trybuild compile-fail probe):
  * tests/zone_serialize_check_compile_fail.rs — replaced assert!(true) smoke
    with subprocess-based probe asserting "D-CASCADE-V1-1 zone_serialize_check:"
    abort signature. 112 LOC + 4 new fixture files.
  * trybuild can't intercept cargo::error= from build-script exit(1); subprocess
    form is equivalent rigour per the fallback path.

S9-W2 (D-PARITY-V2-10 classification doc comments):
  * classification: comments added to external_intent.rs, ontology_dto.rs.
  * cargo run -p dto-class-check exits 0 (was N/N FAIL).

S9-W3 (#355 follow-up #5 lance_cache Arrow schema bump):
  * lance_cache.rs +209 LOC: 12 new columns persisting the MappingRow
    cascade fields per D-CASCADE-V1-7 (cam_pq_code FixedSizeBinary(6),
    base17_head FixedSizeBinary(8), palette_key UInt32, scent UInt8,
    qualia FixedSizeList(Float32, 18), codec_meta UInt32, codec_edge UInt64,
    thinking_style Utf8 nullable, attribute_sources_enc Utf8 with US/RS
    encoding, plus 3 type-ref strings).
  * 2 round-trip tests verify byte-identical equality across write/read.
  * Backward-compat: lossy-allow (old v1 cache files default missing columns
    to MappingRow::default() values).
  * Collateral fix: pre-existing into_inner_unwrap_iter pattern in flush() +
    set_last_root_checksum() replaced with vec![Ok(batch)].into_iter().

S9-W4 (sprint-7 W7 follow-on):
  * namespace_registry.rs 20-line doc comment + 2 regression tests
    confirming SMB.bson is a family-table-layer distinction, NOT a
    registry-namespace-layer entry. Per OQ-4: enumerate("SMB.bson") returns
    empty; seed_defaults() does not seed SMB.bson.

NDARRAY HPC-EXTRAS CORRECTION (per MedCare-rs#150):
  * The "ndarray:master hpc-extras blocker" was MISDIAGNOSED in sprint-7.
    ndarray master DOES ship hpc-extras as a default feature. Real root cause:
    lance-graph consumer Cargo.toml entries setting default-features = false
    without re-enabling hpc-extras.
  * Crates patched: crates/lance-graph-planner/Cargo.toml,
    crates/bgz-tensor/Cargo.toml
  * ISSUES.md entry will be updated post-merge (permission-ask gate).

Tier B (#355 FIX-4 codebook_index bit-collision, FIX-5 trust_below_floor
wiring, per-row BindSpace.context_ids for driver.rs:311) deferred to
sprint-10 — couples to a BindSpace SoA layout change; planning spike required.

Tier C (#355 follow-ups #7/#8/#9 — BioPortal validation, 80 MedCare-rs
tables, 25 MySQL transcode stubs) is cross-repo (OGIT / MedCare-rs side),
not lance-graph's hill.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants