Format Best Practices — VSA, Binary, Quantized

Purpose: Ground every format-choice decision in the relevant science. For each workload, answer: which carrier + which algebra, and what's the SNR / capacity / cache / precision tradeoff?

Scope: VSA variants (F32, BF16, F16, I8), Binary (bitpacked Hamming), CAM-PQ (product quantization). Per-workload guidance.

Companion to: CHANGELOG.md (what changed when), .claude/knowledge/ vsa-switchboard-architecture.md (architectural layers), CLAUDE.md § I-VSA-IDENTITIES (iron rule: VSA operates on identities).

Created: 2026-04-21 (session cleanup, post-Frankenstein correction)

§ 0 — The question the doc answers

For every new workload that might want to use VSA, Binary, or CAM-PQ, answer these in order before writing code:

Is this an identity lookup or a content comparison? — answered per the I-VSA-IDENTITIES iron rule. Identity = VSA legitimate. Content = register-loss risk, use a different tool.
What is N, the bundle size? — capacity bound per § 2.
What's the bit-level dependence ρ? — Jirak-corrected effective capacity per § 3.
What precision does the DOWNSTREAM use need? — per § 4.
What's the memory budget? — cache/L3 analysis per § 5.

Outcome: one format per workload, chosen for reasons, not by default.

§ 1 — The science, briefly

Johnson-Lindenstrauss (capacity upper bound)

For d random ±1 vectors in d-dim space, N items can be distinguished via cosine similarity with error ε if:

d ≥ (8 / ε²) · log N

For d = 16,384 and ε = 0.1:

N ≤ e^(16384 · 0.01 / 8) ≈ e^20 ≈ 10⁸ (way more than any workload needs)

The PRACTICAL ceiling is not JL. It's the cleanup codebook + signal-to- noise after unbind.

Signal-to-noise after unbind

Bundle N role-bound bipolar vectors: each dim of the result is a sum of N ±1 random variables. Under IID:

Mean per dim: 0
Std dev per dim: √N
Signal for matching role after unbind: 1 (or N·(1/N) = 1 after norm)
Noise (other N−1 contributions): std √(N−1) ≈ √N

SNR = 1 / √N (signal / noise after unbind, cosine-normalized).

For recovery against a codebook of known fillers:

cos(unbound, correct_codebook_entry) ≈ 1 − O(1/√N)
cos(unbound, wrong_codebook_entry) ≈ 0 ± O(1/√d)

Cleanup succeeds when the gap between correct and wrong cosine exceeds the f-precision floor. This is the workable regime.

Jirak 2016 — Berry-Esseen under weak dependence

Jirak's role right now is the ACTIVE DECISION FRAMEWORK for CAM-PQ vs Vsa10k choices, not a deferred calibration probe. The ρ values below quantify WHY CAM-PQ bitpacked codes are poor candidates for VSA bundling vs why role-key-generated bits retain full capacity for Vsa10kF32 bundling. Future Jirak-derived threshold calibration (Probe B in § 7) is the RECEIPT-stamping phase; the rule-of-thumb usage here is already applicable.

Classical Berry-Esseen assumes IID. Our bits aren't IID — they have weak dependence from:

CAM-PQ codebook quantization (multiple bits per centroid, coupled)
Overlapping role-key slices (shared slice ranges)
XOR bundle accumulation (induced dependence)
Natural embedding structure (correlated projections)

Jirak 2016 (arxiv 1606.01617, Annals of Probability 44(3) 2024–2063) gives convergence-to-normal rates under α-mixing weak dependence:

n^(p/2-1) for p ∈ (2, 3]
n^(-1/2) in L^q for p ≥ 4

Translation: effective capacity drops by a factor depending on ρ:

effective_n ≈ n · (1 − 2ρ)   (small-ρ approximation)

Typical ρ observed:

Role key bits from SplitMix64: ρ ≈ 0.01 (near IID)
Binary16K from sign-binarize of embeddings: ρ ≈ 0.1–0.2
Binary16K after CAM-PQ quantization: ρ ≈ 0.3–0.5

At 16,384 dims:

ρ = 0.1: effective d ≈ 13,100
ρ = 0.3: effective d ≈ 11,500
ρ = 0.5: effective d ≈ 8,200

SNR formula gets corrected: SNR = 1 / √(N · (1 − 2ρ)^(-1)) ≈ (1 / √N) · √(1 − 2ρ).

For ρ = 0.3, SNR shrinks by ~15%. Hypothesis ranking tolerance must account for this.

The iron rule this justifies (I-NOISE-FLOOR-JIRAK): classical IID Berry-Esseen is wrong for this system. Hand-tuned thresholds (UNBUNDLE_HARDNESS = 0.8, ABDUCTION = 0.88) should be replaced by Jirak-derived bounds when the calibration probe runs.

§ 2 — Capacity regimes (Jirak-corrected)

Given d = 16,384, the safe bundle size N depends on ρ and the precision floor of the downstream consumer:

ρ (bit dependence)	Max N for f32 downstream	Max N for BF16 downstream	Max N for i8 downstream
0.01 (IID random keys)	≤ 1024	≤ 256	≤ 64
0.1 (embedding-derived)	≤ 820	≤ 205	≤ 51
0.3 (CAM-PQ contaminated)	≤ 585	≤ 146	≤ 36
0.5 (heavy quantization)	≤ 410	≤ 102	≤ 25

Max N = (precision_floor × effective_d) / 4, using the SNR bound plus a safety factor of 4 for codebook cleanup tolerance.

The √d/4 ≈ 32 heuristic the switchboard doc uses is the conservative "any-reasonable-ρ safe" bound. It's almost always far below the actual capacity ceiling — use it as a default, measure if you need more headroom.

§ 3 — Per-format precision ceilings

For cosine similarity after unbind, the f-precision determines the smallest distinguishable cosine difference:

Format	Mantissa bits	Precision (decimal)	Smallest cosine Δ
f32	23	7.2	~10⁻⁶
f16 (IEEE half)	10	3.3	~5·10⁻⁴
bfloat16	7	2.3	~5·10⁻³
i8 (±127)	7	2.1	~10⁻²

Implication for epiphany margin (ΔF < 0.05):

f32: plenty of headroom (10⁻⁶ precision vs 5·10⁻² margin) ✓
f16: still fine (5·10⁻⁴ precision vs 5·10⁻² margin) ✓
BF16: borderline (5·10⁻³ precision vs 5·10⁻² margin) ⚠ — epiphany detection may miss the 2nd-best hypothesis when margin < 0.01. Acceptable if epiphany margin is relaxed to 0.05+.
i8: fails (10⁻² precision vs 5·10⁻² margin) ✗ — cannot reliably distinguish near-tie hypotheses.

Implication for recovery_margin in [0, 1]:

All formats can represent [0, 1] with room to spare.
The concern is NOT range; it's precision when values are close to each other (for ranking).

The counterintuitive finding: BF16 (wider range, less precision) is WORSE than F16 for VSA cosine ranking. Range advantage is wasted because bundle magnitudes stay bounded; precision advantage of F16 helps near-tie detection.

BF16 is the right choice when:

AMX hardware is available (native BF16 tensor ops)
Interop with neural lens (Jina v5 BF16, GGUF shards already BF16)
Epiphany margin is relaxed (margin > 0.01)

F16 is the right choice when:

Apple M-series or ARMv8.2+ (F16 hardware-native)
AMX not available
Near-tie epiphany detection matters
No BF16 interop requirement

§ 4 — Cache and memory analysis

Per-vector memory + cache fit analysis. Assumes representative x86 cache sizes (cores vary):

Format	Bytes/vector	L1 (32–64 KB)	L2 (256 KB–1 MB)	L3 (8–32 MB)
Vsa16kF32	65,536 (64 KB)	spill	fits single	fits codebook ≤ 500
Vsa16kBF16	32,768 (32 KB)	tight (single)	fits Markov window	fits codebook ≤ 1024
Vsa16kF16	32,768 (32 KB)	tight (single)	fits Markov window	fits codebook ≤ 1024
Vsa16kI8	16,384 (16 KB)	fits half	fits Markov window + global	fits codebook ≤ 2048
Binary16K	2,048 (2 KB)	fits 16	fits 128–512	fits 4K–16K

Working set for Animal Farm benchmark:

~40,000 sentences × 64 KB f32 trajectory = 2.5 GB total (if every trajectory persisted as f32 — impractical)
Hot path: 11-vector Markov window + global context + ~12 hypotheses = ~900 KB. Fits L3 comfortably.
Per-commit: one trajectory (64 KB) + graph write. Hot path.

Working set for 10K-persona callcenter bank:

10,000 personas × 64 KB f32 = 640 MB. L3 spill, main memory.
10,000 × 32 KB BF16/F16 = 320 MB. Still main memory.
10,000 × 16 KB I8 = 160 MB. Still main memory.
10,000 × 2 KB Binary = 20 MB. Fits L3.
Inference-time: decompress winning persona's I8 → F32 for binding; keep the rest in I8/Binary for comparison.

Working set for OSINT-scale document fingerprints:

10 million docs × 64 KB f32 = 640 GB. Main memory only, heavy.
10M × 2 KB Binary = 20 GB. Fits main memory, acceptable.
10M × CAM-PQ 32-byte codes = 320 MB. Fits RAM easily.

Implication: for large-scale persistence, NEVER store Vsa16kF32 directly. Quantize to i8 or sign-binarize to Binary16K or compress via CAM-PQ. Reserve f32 for the hot compute path.

§ 5 — Per-workload best-practice decisions

Each row applies the full framework: register check, N capacity, precision need, cache fit.

Hot-path compute workloads

Workload	Format	N	Reason
Hypothesis ranking in `Trajectory::resolve`	Vsa16kF32	2–16	Epiphany margin 0.05 demands f32 precision; low N fits L3
Markov ±5 trajectory bundle	Vsa16kF32	11	704 KB window fits L3; f32 precision for accurate braiding
Global context (singleton)	Vsa16kF32	N/A	64 KB trivial; updated per commit
Single sentence SPO bundle	Vsa16kF32	3–8	Low N, precision cheap
Per-turn callcenter role bundle	Vsa16kF32	4–10	Real-time, precision matters

Persistence workloads

Workload	Format	Reason
Committed SPO triples (graph edges)	Binary16K + feature index	Popcount-fast, small per edge
Episodic snapshots (per-sentence)	Vsa16kI8	16 KB trivial, precision sufficient for retrieval
Persona bank (< 100 items)	Vsa16kF32	Fits L3; no need to quantize
Persona bank (1K–10K items)	Vsa16kI8 or Vsa16kBF16	Memory pressure; still usable for bind+cosine
Persona bank (10K+, cold storage)	CAM-PQ 32-byte codes	Heavy compression, decompress top-K on query
Thinking-style configs (12 styles)	YAML content + Vsa16kF32 identity	Tiny scale, f32 identity for resonance dispatch
Archetype registry (144 total)	Vsa16kF32 identity + palette content	Tiny scale, full precision

Comparison / search workloads

Workload	Format	Reason
Rigid-designator lookup (Napoleon → graph node)	HashMap key + Binary16K	Register lookup (Test 0), not VSA
Nearest-neighbor in 1M+ document index	CAM-PQ	Designed for this
k-NN for candidate pre-filter	CAM-PQ, then decompress top-K to Vsa16kF32	Cheap narrow search + precise rerank
Similarity ranking among 12 thinking styles	Vsa16kF32 cosine against identity codebook	Small codebook, f32 precision for close margins

Cross-catalogue workloads

Workload	Format	Reason
Grammar + persona + callcenter bundled in one trajectory	Vsa16kF32 with disjoint slice allocation	Multiple domain catalogues coexist cleanly
Cross-session state transfer	Vsa16kBF16 (with AMX) or Vsa16kI8	Quantize at boundary, rehydrate on entry
Shared with neural lens (Jina v5 BF16)	Vsa16kBF16	Native format match, no conversion

Never-do workloads

Workload	Why rejected
VSA bundle over CAM-PQ codes directly	Register loss — centroid indices XOR to unrecoverable state
Cosine similarity over i8 for close-margin epiphany detection	Precision floor too high (10⁻²) for margin 5·10⁻²
Bundle of 1000+ items with shared codebook	Jirak-effective capacity exhausted; use direct graph lookup
"Similarity" on BlasGraph traversal results via VSA algebra	BlasGraph outputs are binary; use Hamming, not cosine
Vsa16kF32 stored per-edge in a 1M-edge graph	64 GB; use Binary16K or bgz17 palette edge (3 bytes)

§ 6 — The three iron rules this follows

From CLAUDE.md § Substrate iron rules:

I-SUBSTRATE-MARKOV — VSA bundling guarantees Chapman-Kolmogorov semigroup property. Don't replace vsa_bundle (add) with XOR bundling for state-transition paths. MergeMode::Xor is a legitimate single-writer merge, NOT a Markov-respecting kernel.
I-NOISE-FLOOR-JIRAK — bits are weakly dependent; use Jirak 2016 Berry-Esseen bounds instead of classical IID for noise-floor calibration. Hand-tuned σ thresholds are acceptable but must be documented as such.
I-VSA-IDENTITIES — VSA operates on IDENTITY fingerprints that point to content. Never on bitpacked/quantized content itself. Four tests: register laziness, N ≤ capacity, role orthogonality, cleanup codebook.

Every format decision should justify itself against these three. If a proposed format choice fails any iron rule, it's wrong regardless of benchmark numbers.

§ 7 — Probe queue (calibration work owed)

The following measurements should be made before claiming the stack is "grounded" rather than "tuned":

Probe A — ρ measurement on Animal Farm corpus

Measure bit-level dependence on actual Binary16K fingerprints derived from COCA 24K vocab + Markov ±5 parse of Animal Farm.

Autocorrelation ρ(k) at bit offset k for k = 1, 2, 4, 8, 16, 64.
Expected: ρ ≈ 0.01–0.1 for deterministic role-key-generated bits.
If higher, trace dependence source (CAM-PQ, embedding structure).

Output: a measured ρ per corpus. Input to Probe B.

Probe B — Jirak-derived threshold function

Implement jirak_threshold(n, rho, confidence) returning the Berry-Esseen bound on cosine similarity for hypothesis ranking:

pub fn jirak_threshold(n: usize, rho: f32, confidence: f32) -> f32 {
    let effective_n = (n as f32) * (1.0 - 2.0 * rho).max(0.01);
    let c_jirak = /* constant from Jirak 2016 */;
    1.0 - c_jirak / effective_n.sqrt()
}

Replace hand-tuned constants:

UNBUNDLE_HARDNESS_THRESHOLD = 0.8 → jirak_threshold(n_facts, rho, 0.95)
ABDUCTION_THRESHOLD = 0.88 → jirak_threshold(n_hypotheses, rho, 0.99)
HOMEOSTASIS_FLOOR = 0.2 → derived from F-landscape curvature

Output: principled thresholds that adapt to corpus ρ.

Probe C — Capacity stress test per format

For each format (F32, BF16, F16, I8) and each N in {10, 100, 1000}, measure the distinguishability of the correct hypothesis against random wrong hypotheses.

Metric: fraction of trials where cosine(unbound, correct) > cosine(unbound, best_wrong) by at least the precision-floor margin.
Expected falloff per § 3 precision ceilings.

Output: empirical confirmation of the § 3 theoretical precision ceilings per workload / format pair.

§ 8 — The three failure modes this prevents

Failure 1 — "VSA is always lossless"

Wrong. VSA is lossless only when:

f32 (or equivalent precision) accumulator,
N ≤ capacity (Jirak-corrected),
Orthogonal role keys,
Cleanup against a known codebook.

Any of these fails → lossy or broken recovery.

Failure 2 — "BF16 is the modern 16-bit default"

Wrong for VSA. BF16 trades precision for range. VSA cosine similarity needs precision, not range. F16 beats BF16 for VSA unless AMX hardware or neural-lens interop dictates BF16.

Failure 3 — "Use VSA, it's more principled than a hash table"

Wrong for register-appropriate workloads. HashMap<&str, Def> + O(1) lookup is the right tool when the item has a name. VSA resonance is the right tool when the item is INFERRED from context. Don't reach for VSA when a register would do — that's Test 0 failing.

References

CHANGELOG.md — format-switch history (canonical "when did format X change and why?")
.claude/knowledge/vsa-switchboard-architecture.md — three-layer architecture + decision matrices
CLAUDE.md § I-VSA-IDENTITIES, § I-NOISE-FLOOR-JIRAK, § I-SUBSTRATE-MARKOV — the three iron rules
Jirak 2016 — "Berry-Esseen theorems under weak dependence", Annals of Probability 44(3) 2024–2063, arxiv 1606.01617
Shaw, Furlong, Anderson, Orchard 2501.05368 — "Developing a Foundation of Vector Symbolic Architectures Using Category Theory"
Kleyko et al. 2106.05268 — "VSA as a Computing Framework for Emerging Hardware"
.claude/board/TECH_DEBT.md — open calibration debts (Jirak thresholds, 157→160 SIMD, Vsa10k→Vsa16k rescale)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Format Best Practices — VSA, Binary, Quantized

§ 0 — The question the doc answers

§ 1 — The science, briefly

Johnson-Lindenstrauss (capacity upper bound)

Signal-to-noise after unbind

Jirak 2016 — Berry-Esseen under weak dependence

§ 2 — Capacity regimes (Jirak-corrected)

§ 3 — Per-format precision ceilings

§ 4 — Cache and memory analysis

§ 5 — Per-workload best-practice decisions

Hot-path compute workloads

Persistence workloads

Comparison / search workloads

Cross-catalogue workloads

Never-do workloads

§ 6 — The three iron rules this follows

§ 7 — Probe queue (calibration work owed)

Probe A — ρ measurement on Animal Farm corpus

Probe B — Jirak-derived threshold function

Probe C — Capacity stress test per format

§ 8 — The three failure modes this prevents

Failure 1 — "VSA is always lossless"

Failure 2 — "BF16 is the modern 16-bit default"

Failure 3 — "Use VSA, it's more principled than a hash table"

References

FilesExpand file tree

FormatBestPractices.md

Latest commit

History

FormatBestPractices.md

File metadata and controls

Format Best Practices — VSA, Binary, Quantized

§ 0 — The question the doc answers

§ 1 — The science, briefly

Johnson-Lindenstrauss (capacity upper bound)

Signal-to-noise after unbind

Jirak 2016 — Berry-Esseen under weak dependence

§ 2 — Capacity regimes (Jirak-corrected)

§ 3 — Per-format precision ceilings

§ 4 — Cache and memory analysis

§ 5 — Per-workload best-practice decisions

Hot-path compute workloads

Persistence workloads

Comparison / search workloads

Cross-catalogue workloads

Never-do workloads

§ 6 — The three iron rules this follows

§ 7 — Probe queue (calibration work owed)

Probe A — ρ measurement on Animal Farm corpus

Probe B — Jirak-derived threshold function

Probe C — Capacity stress test per format

§ 8 — The three failure modes this prevents

Failure 1 — "VSA is always lossless"

Failure 2 — "BF16 is the modern 16-bit default"

Failure 3 — "Use VSA, it's more principled than a hash table"

References