docs: ledger efficiency + phases + HDC substrate corrections + cognitive refactor scope#214
Merged
Merged
Conversation
Append FINDING to EPIPHANIES.md. PR #213 + ndarray PR #110 demonstrated the dumb-bookkeeper pattern: ~90 seconds, Haiku, enumerate+match+append. Result is a grep-addressable index of every shipped artifact keyed by the prompt-file brief that birthed it. For every future "what did we ship about X" query the ledger replaces a full-codebase grep with a single line — ~25 tokens vs ~25M tokens. Seven orders of magnitude cheaper. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Add Phase 6 (grammar PR #208-210) and Phase 7 (governance PR #211-213) to integration_phases.md. Phase 8 queued (elegant-herding-rocket v1 D2+). Also classify the 45 "none" rows from PROMPTS_VS_PRS.md into three groups the Haiku couldn't distinguish: - Live — open prompt aligned to a named live phase (work to pick up). - Implicitly resolved / superseded — shipped under an overlapping PR title (Haiku's literal filename match missed the semantic overlap). Queues a Tier-2 Opus meta-pass to annotate the ledger with `superseded by #N` without re-reading code. - Genuinely stale — no active adjacency; archive-candidate. The ledger still saves ~10^7× over code grep even accounting for Pass 2's semantic-refinement overhead — Pass 2 reads ~10 KB of ledger text, not megabytes of Rust. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…ambient) Third finding appended to EPIPHANIES.md — orthogonal to cold-start tax (20-30 turns) and find-code discount (10⁷×). This is the AMBIENT channel: 30-50% of every session's token budget burns on rediscovering what code paths exist / what was tried / why code is shaped the way it is. The prompt↔PR ledger collapses all three channels to two text-file reads: - Cold-start: 20-30 turns → 3-5 turns (~6×) - Find-code: ~25M tokens → ~25 tokens (10⁷×) - Ambient: 30-50% → 0% (2×-eternal) https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Per user (2026-04-19): Container<[u64; 256]> is already 16,384 bits in LanceDB storage, so padding 157 (or even 160) fingerprints into 256 wastes 62% (resp. 37%) of every row. Unifying FP_WORDS with the Container primitive means: - zero padding, zero remainder loops at any SIMD level - cache-line perfect (2 KB / 64 B = 32 cache lines, clean every level) - +62% VSA capacity for free (Plate's bound ~1,500 → ~2,400 items) - No rebake of stored fingerprints — Container was 256 already No LanceDB patch needed — FixedSizeList<UInt8, 2048> is native. Supersedes the tech-debt entry targeting FP_WORDS = 160. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Concrete three-pass recipe added to the cca2a skill:
Pass 1 — Haiku bookkeeper (~90 s, mechanical): enumerate prompt files,
match against git log, append one ledger line per pair.
Pass 2 — Opus meta-synthesizer (read-only inputs): annotate the "none"
rows from Pass 1 with superseded/open/stale classification.
Pass 3 — Main thread consumer (sub-second per query): grep the ledger
for every "what's open / shipped / about X" question.
Closes three token-waste channels simultaneously:
- cold-start (20-30 turns → 3-5 turns)
- find-code (~25M tokens → ~25 tokens, 10⁷×)
- ambient arc knowledge (30-50% → 0%)
First deployments:
- PR #213 (lance-graph, 41 prompts mapped, 90 s)
- PR #110 (ndarray, 25 prompts mapped, 90 s)
Linked from SKILL.md under "What to read when".
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
CORRECTION to prior IDEA "FP_WORDS = 256" and TECH_DEBT entry for the Vsa10k* → Vsa16k* rename sweep. Two distinct substrates that must NOT be collapsed: 1. Hamming binary fingerprint — Container<[u64; 256]> = 16,384 bits = 2 KB. Popcount metric. NOT VSA. 2. VSA (= HDC, High-Dimensional Computing) substrate — 16,384 DIMENSIONS × float. bind / bundle / permute. Canonical per Plate / Kanerva / arxiv 2111.06077. Never binary, never 10k. Size per VSA fingerprint: - Vsa16kF32 (lossless): 64 KB - Vsa16kBF16: 32 KB - Vsa16k u8 × 5-lane: 80 KB - Vsa16k BF16 × 5-lane: 160 KB Ban "10,000-D binary VSA" framing workspace-wide. When writing about binary fingerprints say "16,384-bit Hamming fingerprint" / "Container" — never "VSA". When writing about the HDC substrate say "16,384-D float VSA" — never "binary", never "10k". Follow-up PR will rename CrystalFingerprint::Vsa10kF32 → Vsa16kF32, re-address role-key slices from [0..10000) → [0..16384), and sweep ~28 files for legacy 10k mentions. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Per user clarification (2026-04-19): REFINEMENT to prior IDEA CORRECTION-OF — the "no 10000-D VSA" ban is NOT workspace-wide. Three scopes legitimately preserve 10k until the coordinated rename PR: 1. Grammar prototype (role_keys + ContextChain, shipped at 10k in #210) 2. Quantum prototype (Vsa10kF32 holographic residual) 3. Ladybug-rs / bighorn imports (PRs #200-#203 cognitive stack) Elsewhere: strip 10k mentions. Files in-scope vs out-of-scope enumerated in the IDEAS entry. TECH_DEBT for the ladybug memory pathology: - Observed 700-1,100 MB runtime after #200-#203 imports at 10k - 16k rename WORSENS per-row cost 40 KB → 64 KB at f32 - Fix requires LanceDB mmap zero-copy + working-set cache policy, not wider substrate alone - Gate the 16k rename on peak-RAM measurement against Animal Farm D10 - Sparse-encoding candidate (Structured5x5 cells only) for common case https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
REFINEMENT-2 correcting prior size math. Key corrections:
- HDC superposition runs at FP16 / BF16, not f32. f32 naming (Vsa10kF32)
reflects legacy compute precision, not storage; half-precision is
sufficient for bundle accumulation.
- Per-row size table:
1024-D Jina v3: 2 KB
1536-D OpenAI small: 3 KB
3072-D Upstash max: 6 KB
10000-D HDC FP16: 20 KB (current target)
16384-D HDC FP16: 32 KB (rename target)
16384-D × 5-lane u8: 80 KB
16384-D × 5-lane BF16: 160 KB
- The ladybug 700-1100 MB blowup was at f32 40 KB/row. Had it been
FP16 (20 KB/row) memory would have been 350-550 MB. 16k × FP16
(32 KB/row) at the same population = 560-880 MB — cheaper than the
current f32 state.
- The 16k rename is memory-positive ONLY if paired with f32 → FP16
migration. Without it, 16k × f32 (64 KB/row) inflates the problem.
- Architectural constraint: commercial vector DBs (Upstash, Pinecone,
Qdrant) cap at ≤ 3072 dims. HDC at 16384-D can only live in LanceDB
FixedSizeList<BFloat16, 16384>. That's why lance-graph is The Spine.
Rename scope: Vsa10kF32 → Vsa16kBF16 (not Vsa16kF32); role-key slices
[0..10000) → [0..16384); storage contract = FixedSizeList<BFloat16,
16384>; f32 compute preserved internally only for numerically-sensitive
hot paths.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
CORRECTION-OF the 2026-04-19 "Ladybug 700-1100 MB memory blowup" entry. Per user: there is no 10,000 × 10,000 matrix we actually want. The blowup was a glitch — a dense 10k × 10k structure imported from outdated ladybug-rs / bighorn (PRs #200-#203) that ended up in the binary by accident. Math: - 10,000 × 10,000 × f32 = 400 MB (single allocation) - Plus cognitive-stack state → 700-1,100 MB total observed Fix: identify and DELETE the glitch allocation. Not a migration. Candidates: token-token distance matrix, co-occurrence matrix, dense attention matrix, K=10000 CLAM centroid table. High-probability locations: cognitive crate, CognitiveShader, BindSpace, CollapseGate, adaptive codecs imported from ladybug-rs without trimming. This invalidates: - "16k rename makes memory worse" — the per-row math was sound but irrelevant to this specific blowup. - Mmap zero-copy requirement — still good hygiene, not the fix here. - Sparse encoding dependency — still architecturally useful, unrelated to the glitch. 16k rename + f32 → BF16 migration proceed independently of this P0 deletion. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…late Lock the categorical distinction: - 16K-D wire VECTOR (intentional): 1 × 16,384 = 10⁴ cells, 32 KB BF16. One fingerprint, lossless round-trip, LanceDB native. - 10K × 10K glitch MATRIX (unintentional): 10,000 × 10,000 = 10⁸ cells, 200 MB BF16 / 400 MB f32. Zero purpose, imported debris from outdated ladybug-rs / bighorn. Four orders of magnitude apart. They share a numeric coincidence only. Future docs describing 10k-D HDC must say VECTOR explicitly — plain "10,000-D HDC" is ambiguous and preserves the category error. The rename (Vsa10kF32 → Vsa16kBF16) is about the vector; the matrix deletion is a separate P0 cleanup. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Typical server L3 = 32-96 MB. Hot-path structures exceeding this take ~8× DRAM latency penalty per access, regardless of storage capacity. Codec stack implication: dense square matrices capped at sqrt(L3 / cell_size). At 32 MB budget with BF16 cells that's ~4000 × 4000. 10k × 10k at f32 (400 MB) is 12× over and architecturally disqualified. This is the deep reason the full codec chain exists (planes → ZeckBF17 → Base17 → Palette → CAM-PQ → Scent): each layer picks a density that keeps the hot table L3-resident. 1-D vectors are cheap — a 16K-D BF16 row is 32 KB, thousands fit L3-resident. The L3 bound binds on 2-D dense, not 1-D. That's why the rename (Vsa10kF32 → Vsa16kBF16) is safe; the 10k × 10k matrix is not. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
The "vector vs matrix" + "L3 working-set cap" entries from earlier this session aren't new findings — both are long-established invariants of the workspace (codec chain design, L3-budget sizing). Appending a supersede entry that downgrades both to SUPERSEDED so the FINDING log stays useful. The 10k × 10k matrix is legacy-hygiene debt from the stone-age ladybug-rs / bighorn import. Nobody re-validated it against workspace invariants because the imports were expected to be rewritten or deleted; touching the code was migration desperation, not design. Delete-on-touch, not a principle waiting to be learned. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Per user (2026-04-19): ladybug-rs is archived, not maintained. Migration target is ada-rs + lance-graph going forward. Ladybug becomes read-only reference quarry for patterns (Fingerprint, BindSpace, CognitiveShader, adaptive codecs, CollapseGate). Consequences for prior ledger entries: - Glitch-matrix deletion downgrades P0 → P2 (goes away with archival unless it's currently linked into built binaries). - Vsa10k → Vsa16k rename scope tightens to ada-rs + lance-graph; no ladybug touches. - Refactor-resistance audit for ladybug imports is obsolete — harvest patterns clean, don't port code. - CLAUDE.md "ladybug-rs = The Brain" line is stale; ada-rs takes that slot. Architecture diagram update in follow-up PR. Five-repo stack becomes: ndarray = The Foundation lance-graph = The Spine ada-rs = The Brain (was ladybug-rs) crewai-rust = The Agents n8n-rs = The Orchestrator https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…xcise
Not a wholesale deletion target — it's the complementary cognitive
layer above SPO store + contract primitives. Targeted refactor:
Keep: grammar/Triangle, spo/Crystal-layer, spectroscopy (unique)
Merge: search/temporal → planner::strategy; cypher_bridge audit
Inspect: fabric/, world/, container_bs/ → DTO-move or excise
Excise: learning/ stub, wip-gated non-compiling modules, core_full/
catch-all decomposition
~1 week refactor, zero functional change, tightens contract adoption
per CLAUDE.md §Current Status In-Progress rule.
Plan D3 depends on GrammarTriangle staying in lance-graph-cognitive,
so no rename or cross-crate move for that module.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Strip ada-rs mentions from the lance-graph-cognitive refactor entry. The cognitive DTO contract surface (state classification pillars + shader-driver endpoints) already shipped in PR #206 under Pumpkin NPC framing. The refactor is dedup/merge/excise against that existing contract, not new trait work. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh