AdaWorldAPI · 2026-04-05T20:42:34Z

No description provided.

…sults SignedThinkingEngine (signed_engine.rs, 290 lines, 9 tests): - i8 distance table: positive=excite, negative=inhibit, zero=orthogonal - Clamp-to-zero after each cycle (inhibited atoms die) - F32x16 SIMD 4x unrolled (same as unsigned engine) - Tracks E/I ratio, inhibition counts, sign distribution - from_unsigned(): u8 HDR -> i8 by subtracting 128 - build_signed_table(): cosine -> i8 directly from centroids DualEngine (dual_engine.rs, 140 lines, 4 tests): - Runs unsigned + signed on same input, compares results - Measures: peak agreement, convergence speed, inhibition count - DualResult with summary() for quick comparison Example dual_signed_experiment (Jina v3 HDR 256x256): Results on Jina v3 lens: E/I ratio: 49.8% excitatory / 50.2% inhibitory (near-balanced!) Agreement: 88% (7/8 shared peaks across all 4 tests) Signed converges slower: 13-14 cycles vs 7-8 unsigned Strong inhibition: ~1000 total inhibitions per run Signed finds unique peak (130), unsigned finds different one (179) -> Moderate agreement = signed sees different topology -> Path C (run both) recommended 167 tests pass (was 154, +13 new). Zero failures. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

Results across all 3 baked lenses: Lens Agreement Inhibitions Convergence ──── ───────── ─────────── ─────────── Jina v3 (narrow) 88% 1,060 u:7 s:13 BGE-M3 (multilingual) 62% 1,212 u:7 s:16 Jina Reranker (WIDEST) 50% 2,264 u:12 s:20 KEY FINDING: Agreement DROPS as cos range widens. Jina v3 (cos[-0.067,0.234]): 88% — mostly positive, signed ~ unsigned BGE-M3: 62% — moderate range, signed diverges Reranker (cos[-0.886,+0.826]): 50% — balanced, HALF the peaks differ This confirms the session hypothesis: wider cos range = more gate sign information = signed i8 sees different topology than unsigned u8. The reranker shows 4 signed-unique peaks per test that unsigned misses. These are thoughts that ONLY emerge through inhibition. Signed also produces lower entropy (more focused) across all lenses. But converges 2x slower (inhibition creates competition). VERDICT: Path C (run both). Signed is not redundant — it finds different peaks, especially on balanced-range models. SiLU-ONNX may still be needed for unsigned path. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

6-phase plan to replace synthetic calibration with real ONNX inference: Phase 1: Add rten dep (feature-gated, optional) Phase 2: Download v5 ONNX (2.39 GB) + tokenizer (11.4 MB) Phase 3: jina_v5_onnx.rs module (rten forward pass → 1024D f32) Phase 4: Stream v5 F16 GGUF → CLAM → bake HDR lens Phase 5: 5-path calibration matrix (ONNX/raw/gamma-phi/i8/hhbgz) Phase 6: Wire v5 as default truth anchor in calibrate_lenses.rs Key finding from HuggingFace: Jina v5 text-small-text-matching (677M, Qwen3-0.6B base) ONNX: model.onnx (1.27 MB) + model.onnx_data (2.38 GB) GGUF F16: 1.2 GB (streamable, same as v3 pipeline) Matryoshka dims: 32-1024 (can trade quality for speed) This will answer the signed experiment question definitively: Does 50% reranker disagreement mean signed is MORE or LESS accurate? Compare both paths against ONNX ground truth → Spearman ρ decides. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

…bration Three new modules (22 tests, ~850 lines): l4_bridge.rs (4 tests): commit_to_l4(): L3 peaks → binarize table rows → XOR bind → L4.learn() recognize_thought(): check if L4 has seen this peak pattern before bias_from_l4(): generate sensor weights from accumulated experience Uses distance table rows as centroid proxies (centered at 0 for sign bits) composite_engine.rs (5 tests): CompositeEngine: multiple lenses → independent engines → superposition perturb_lens(): different tokens per model (Jina, BGE, Reranker) think_all(): run all lenses, compose peaks, measure pairwise agreement CompositeResult: superposed peaks with lens count + agreement matrix signed_domino.rs (3 tests): SignedDominoCascade: i8 table where sign IS the gate decision Positive values = excitatory (SUPPORTS/CAUSES) Negative values = inhibitory (NATURAL contradictions, no floor needed) Inhibition subtracts energy from excitatory atoms during cascade calibrate_lenses.rs: replaced synthetic pairs with real literary text Tier 1: Rumi, Tagore, STS-B paraphrase, technical equivalence Tier 2: Rumi↔Tagore thematic, Wittgenstein↔Gödel, Palantir↔Snowden Tier 3: consciousness angles, law branches, Newton↔quantum Tier 4: Rumi↔TCP/IP, Tagore↔Federal Reserve, CRISPR↔Bach Hash tokenization gives ρ=0.25 (BROKEN) — confirms need for ONNX ground truth 188 tests pass. Zero failures. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

calibrate_lenses.rs rewritten with real HuggingFace tokenizers: - Jina v3: XLM-RoBERTa tokenizer via from_pretrained (250K vocab) - Reranker: Qwen2 tokenizer from local file (151K vocab, shared with Qwopus) - Graceful fallback to hash when tokenizer unavailable - Calibration corpus: Rumi, Tagore, Wittgenstein, Gödel, Snowden, STS-B, CRISPR, Bach, neurodegenerative disease — 4 similarity tiers [patch.crates-io] tokenizers pointed to local clone (../../../tokenizers/tokenizers) - Enables SIMD optimization of BPE hot paths via ndarray backend - Same crate API, local modifications for performance - Path: /home/user/tokenizers/tokenizers (huggingface/tokenizers v0.22.3-dev) Result: Reranker tokenizer loads from disk (Qwen2 BPE). Jina needs its own tokenizer.json in data/jina-v3-hdr/ (XLM-RoBERTa vocab). Each lens needs its matching tokenizer — mixing gives wrong codebook lookups. 188 tests pass. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

Downloaded from FacebookAI/xlm-roberta-large-finetuned-conll03-{english,german}: data/jina-v3-hdr/tokenizer.json 8.7 MB (XLM-RoBERTa 250K vocab) data/bge-m3-hdr/tokenizer.json 8.7 MB (same tokenizer, BGE-M3 = XLM-RoBERTa) data/xlm-roberta-de/tokenizer.json 8.7 MB (German NER variant, same vocab) Calibration now runs with REAL BPE tokenization: Jina v3: LOADED (XLM-RoBERTa 250K) Reranker: LOADED (Qwen2 151K, from Qwopus tokenizer on disk) Results with real BPE: ρ still low (0.07 Jina) because avg pairwise centroid distance ≠ text similarity. Need full MatVec think cycle (perturb → think → commit) for meaningful calibration, not just raw centroid lookup averages. The ONNX ground truth path solves this. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

Removed the hash tokenizer fallback from the hot path — it never worked. Both lenses now use real BPE (XLM-RoBERTa for Jina, Qwen2 for Reranker). Calibration rewritten three times this session: 1. avg pairwise centroid distance → ρ=0.07 (no engine, just lookup noise) 2. MatVec think cycle → cos=1.000 for ALL pairs (attractor collapse) 3. Domino cascade 3σ focus → actual differentiation (0.000-0.389) but ρ=-0.57 (anti-correlated with human judgment) Diagnosis: 256-centroid codebook is too coarse for text similarity. Short texts share structural tokens (articles, prepositions) that map to identical centroids, creating spurious overlap. The engine differentiates — the codebook doesn't. Next steps: - 4096-centroid codebook (full L3 table, 16 MB) - Or: per-role tables (attn_k cos range is wider than token_embd) - Or: Jina v5 ONNX ground truth to replace expert-assigned scores https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

Three tools, three roles: candle: Training (SiLU) + Forward Pass Ground Truth (Jina/BGE-M3/Qwen) ort: Reranker Ground Truth (only path) + Bulk Calibration speed rten: Medical Imaging Sensor (U-Net/ViT, pure Rust, separate) New feature flag: --features calibration candle-core 0.9, candle-nn 0.9, candle-transformers 0.9, hf-hub 0.4 All optional — default build unchanged (188 tests pass) Follows EmbedAnything's proven versions (candle 0.9.2). Jina BERT, ModernBERT, Qwen3 forward pass patterns available from StarlightSearch/EmbedAnything/rust/src/models/ as reference. ort for Reranker is next (separate dep, only needed for cross-encoder). https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

…exports ndarray PR #115 (AdaWorldAPI/ndarray) made wht_f32, kmeans, squared_l2, quantize_f32_to_i2/dequantize_i2_to_f32, dequantize_i8_to_f32 public. This commit drops the local compat shim and switches to the upstream exports, removing ~150 LOC of duplicated quantization/transform code. QuantParams.zero_point is i32 in upstream (vs f32 in shim) — call sites adjusted to construct QuantParams with the upstream shape. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

PR #308 added a temporary `ref: claude/continue-lance-graph-ndarray-Ld786` pin to the lance-graph CI workflows because the symbols bgz-tensor / lance-graph-callcenter consume (wht_f32, kmeans, squared_l2, dequantize_i8_to_f32, quantize_f32_to_i2, dequantize_i2_to_f32) were not yet public on ndarray master. The TODO said to revert once ndarray PR #115 lands. ndarray PR #115 merged 2026-04-30 07:01:44 UTC. Verified the symbols are now reachable on master at: src/simd.rs (re-exports the public surface) src/hpc/cam_pq.rs (kmeans, squared_l2) src/hpc/fft.rs (wht_f32) src/hpc/quantized.rs (i8/i2 quantize/dequantize) Removes 4 occurrences of the pin (2 in rust-test.yml + 2 in style.yml, one per CI job that checks out ndarray as a sibling) and replaces the TODO with an explanatory comment that names the upstream PR + date so the next reader understands the history. Diff: +12 / -16 across 2 files. YAML still well-formed. No code change. No medcare-rs / smb-office-rs side effect. Cross-link: ndarray PR #115, lance-graph PR #308 (the original pin).

…-L3DF0 ci: revert ndarray-branch pin — PR #115 has landed on master

claude added 9 commits April 5, 2026 20:14

chore: update Cargo.lock for patched tokenizers

c4d91f3

https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

AdaWorldAPI merged commit 40e6d92 into main Apr 5, 2026

AdaWorldAPI mentioned this pull request Apr 30, 2026

feat: bilingual ontology DTO surface + bgz-tensor workspace inclusion #308

Merged

5 tasks

AdaWorldAPI mentioned this pull request Apr 30, 2026

ci: revert ndarray-branch pin — PR #115 has landed on master #315

Merged

AdaWorldAPI added a commit that referenced this pull request Apr 30, 2026

Merge pull request #315 from AdaWorldAPI/claude/ci-revert-ndarray-pin…

8cac337

…-L3DF0 ci: revert ndarray-branch pin — PR #115 has landed on master

AdaWorldAPI mentioned this pull request Apr 30, 2026

fix(ci): revert ndarray branch pinning (PR #115 merged) #317

Closed

AdaWorldAPI mentioned this pull request May 13, 2026

impl(sprint-7): 7-worker implementation wave for sprint-5/6 specs + AuditSink trait unification #366

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdaWorldAPI commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants