AdaWorldAPI · AdaWorldAPI · Apr 6, 2026 · Apr 5, 2026 · Apr 5, 2026 · Apr 5, 2026
diff --git a/SESSION_VISION_SENSOR_VIT.md b/SESSION_VISION_SENSOR_VIT.md
@@ -0,0 +1,202 @@
+# SESSION: Vision Sensor — ViT-Huge-14 for Medical Imaging + Multimodal
+
+## THE ARCHITECTURE
+
+```
+Text sensor (current):
+  text → tokenizer → token_ids → codebook_index → centroids → distance table → think
+
+Vision sensor (planned):
+  image → ViT patches (14×14 px) → patch embeddings → codebook_index → centroids → distance table → think
+
+Same engine. Different sensor. Same MatVec. Same domino cascade.
+Models are SENSORS. The matrix is the BRAIN.
+```
+
+## GROUND TRUTH MODELS
+
+### Text (Jina v5 — Qwen3-0.6B)
+
+```
+Model:    jinaai/jina-embeddings-v5-text-small-text-matching
+Base:     Qwen3-0.6B
+Format:   safetensors (1.19 GB) + ONNX f32 (2.39 GB) + GGUF F16 (1.2 GB)
+Tokenizer: Qwen3 BPE (151K vocab, 11.4 MB)
+Dim:      1024
+Pooling:  last-token
+Tool:     candle (loads safetensors directly, no ONNX needed)
+Status:   tokenizer downloaded, candle wired, forward pass TODO
+```
+
+### Vision (ViT-Huge-14 from CLIP)
+
+```
+FP32 ground truth:
+  Repo:   Kijai/WanVideo_comfy
+  File:   open-clip-xlm-roberta-large-vit-huge-14_visual_fp32.safetensors
+  Size:   2.53 GB
+  Precision: FP32 (24-bit mantissa, NO BF16 truncation)
+  Tool:   candle (loads safetensors) OR rten (after ONNX conversion)
+
+BF16 production:
+  Repo:   DeepBeepMeep/Wan2.1
+  File:   models_clip_open-clip-xlm-roberta-large-vit-huge-14-bf16.safetensors
+  Size:   2.39 GB
+  Precision: BF16 (7-bit mantissa, ±0.008 rank flips at boundaries)
+  Includes: BOTH text encoder (XLM-RoBERTa) + visual encoder (ViT-Huge-14)
+
+Architecture:
+  ViT-Huge-14:
+    Patch size: 14×14 pixels
+    Each patch = one "token" (like BPE subword for text)
+    ~630M parameters
+    Trained contrastively with XLM-RoBERTa (CLIP objective)
+```
+
+### Cross-Modal (CLIP — text ↔ image in same space)
+
+```
+The CLIP training objective:
+  For (text, image) pairs:
+    text_emb = XLM-RoBERTa(text)
+    image_emb = ViT-Huge-14(image)
+    loss = contrastive(text_emb, image_emb)
+
+After training:
+  cos(text_emb, image_emb) = semantic similarity across modalities
+  "amyloid plaque in temporal lobe" ↔ brain MRI = high cosine
+  "amyloid plaque in temporal lobe" ↔ chest X-ray = low cosine
+
+For our architecture:
+  Text codebook (XLM-RoBERTa) and vision codebook (ViT) share embedding space
+  Cross-modal distance table: text_centroid × image_centroid → similarity
+  One CompositeEngine with text lens + vision lens → superposition
+```
+
+## MEDICAL IMAGING PIPELINE
+
+```
+Phase 1: Image input
+  DICOM → PNG/TIFF → resize to ViT resolution
+  OR: direct from PACS/radiology viewer
+
+Phase 2: ViT forward pass (rten, pure Rust)
+  Image → 14×14 patches → ViT encoder → f32 embedding per patch
+  Global: mean pool patch embeddings → 1024D image embedding
+  Local: per-patch embeddings for segmentation
+
+Phase 3: Codebook + distance table
+  CLAM 256 centroids from ViT patch embeddings (same as text pipeline)
+  256×256 distance table (same HDR CDF or i8 signed encoding)
+  codebook_index: patch_embedding → centroid_id
+
+Phase 4: ThinkingEngine
+  perturb(patch_centroid_ids) → think(10 cycles) → commit()
+  Same engine, same MatVec, same domino cascade
+  Qualia from convergence = visual gestalt of the image
+
+Phase 5: SPO extraction
+  Dominant atoms → centroid labels → SPO triples
+  (lesion, ADJACENT_TO, ventricle)
+  (tumor, LARGER_THAN, 2cm)
+  NARS truth values from convergence confidence
+
+Phase 6: Cross-modal query
+  Text: "Show me cases with amyloid plaques near the hippocampus"
+  → Jina v5 tokenize → codebook → text_centroids
+  → CLIP cross-modal similarity with image_centroids
+  → Ranked retrieval from image database
+```
+
+## WHALE SONOGRAPHY (SESSION_WHALE_SONOGRAPHY.md)
+
+```
+Same pipeline applied to:
+  Ultrasound images → ViT patches → codebook → think
+  Age-cohort stratification via L4 experience
+  Longitudinal tracking via trajectory (trajectory-cartographer agent)
+
+The ViT sensor treats ultrasound frames as images.
+No special medical preprocessing — the codebook learns the topology.
+```
+
+## OSINT INTEGRATION
+
+```
+WikiLeaks documents often contain:
+  Text (cables, reports) → Jina/BGE-M3 text sensor
+  Images (maps, photos, diagrams) → ViT vision sensor
+  OCR'd text from images → text sensor (after ocrs/rten OCR)
+
+Cross-modal CLIP similarity:
+  "drone strike coordinates" (text) ↔ satellite image (vision)
+  Both in same embedding space → one distance table query
+```
+
+## CALIBRATION (same pattern as text)
+
+```
+Vision ground truth:
+  FP32 safetensors → candle forward pass → f32 patch embeddings
+  Calibrate against: baked u8 CDF, i8 signed, γ+φ encoded tables
+  Same 5-lane encoder, same Spearman ρ, same ICC profiles
+
+Text ground truth:
+  Jina v5 safetensors → candle forward pass → f32 text embeddings
+  Same calibration pipeline
+
+Cross-modal ground truth:
+  CLIP FP32 → both encoders → cross-modal cosine
+  Calibrate: cross-modal distance table vs CLIP cosine
+```
+
+## THREE TOOLS FOR THREE SENSOR TYPES
+
+```
+Tool      Text sensor              Vision sensor            Cross-modal
+────      ───────────              ─────────────            ───────────
+candle    Jina v5 forward pass     ViT-Huge-14 forward      CLIP joint
+ort       Reranker cross-encoder   —                        —
+rten      —                        Medical ViT segmentation  —
+
+candle loads safetensors (text + vision).
+ort loads ONNX (reranker only, cross-encoder architecture).
+rten loads ONNX (medical imaging, pure Rust, AdaWorldAPI fork).
+```
+
+## IMPLEMENTATION ORDER
+
+```
+1. [NOW]  Jina v5 text ground truth (candle + Qwen3 tokenizer)
+2. [NEXT] Cross-model text calibration (Jina v3 ↔ v5 ↔ Reranker ↔ BGE-M3)
+3. [NEXT] 5-lane encoding + Spearman ρ + ICC profiles
+4. [THEN] ViT-Huge-14 vision ground truth (candle + FP32 safetensors)
+5. [THEN] Medical imaging codebook (CLAM on ViT patch embeddings)
+6. [THEN] Cross-modal CLIP distance table
+7. [THEN] OSINT multimodal query (text + image in same search)
+```
+
+## FILES
+
+```
+Ground truth models:
+  jinaai/jina-embeddings-v5-text-small-text-matching  (text, Qwen3)
+  Kijai/WanVideo_comfy/..._visual_fp32.safetensors    (vision, ViT-Huge-14, FP32)
+  DeepBeepMeep/Wan2.1/..._bf16.safetensors            (combined CLIP, BF16)
+
+Tokenizer:
+  data/jina-v5-tokenizer.json          (Qwen3 BPE, 151K vocab, 11.4 MB)
+  data/jina-v3-hdr/tokenizer.json      (XLM-RoBERTa, 250K vocab, 8.7 MB)
+
+Code:
+  src/tokenizer_registry.rs   (6 models, cross-model tokenization)
+  src/ground_truth.rs         (calibration DTOs, Spearman ρ)
+  src/composite_engine.rs     (multi-lens including future vision lens)
+  src/tensor_bridge.rs        (F32/I8/U8/Tensor bridge for candle output)
+  examples/stream_signed_lens.rs (5-lane encoder with γ+φ metadata)
+
+Agents:
+  .claude/agents/family-codec-smith.md  (HEEL/HIP/BRANCH/TWIG/LEAF encoding)
+  ndarray/.claude/agents/truth-architect.md (BF16 truth, causality)
+  ndarray/.claude/agents/cascade-architect.md (3-stroke search)
+```
diff --git a/crates/thinking-engine/data/.gitignore b/crates/thinking-engine/data/.gitignore
@@ -1,2 +1,4 @@
 *.onnx
 *.onnx_data
+tokenizer.json
+*-tokenizer.json
diff --git a/crates/thinking-engine/data/bge-m3-hdr/tokenizer.json b/crates/thinking-engine/data/bge-m3-hdr/tokenizer.json
diff --git a/crates/thinking-engine/data/jina-v3-hdr/tokenizer.json b/crates/thinking-engine/data/jina-v3-hdr/tokenizer.json
diff --git a/crates/thinking-engine/data/xlm-roberta-de/tokenizer.json b/crates/thinking-engine/data/xlm-roberta-de/tokenizer.json
diff --git a/crates/thinking-engine/examples/end_to_end_signed.rs b/crates/thinking-engine/examples/end_to_end_signed.rs
@@ -0,0 +1,162 @@
+//! End-to-end test: real tokenizer → signed engine → nucleus sampling.
+//!
+//! Tests whether the full pipeline produces meaningful similarity:
+//!   Similar texts (Rumi↔Rumi) should have higher overlap than
+//!   unrelated texts (Rumi↔TCP).
+//!
+//! Uses: real XLM-RoBERTa tokenizer, Jina v3 HDR lens (converted to i8),
+//! SignedThinkingEngine with Nucleus pooling (T=0.7, p=0.9).
+//!
+//! This is the SMOKE TEST before calibration. If this fails,
+//! the 7-lane encoding and ONNX ICC are measuring noise.
+
+use thinking_engine::jina_lens::{JINA_HDR_TABLE, jina_lookup_many, JINA_N_CENTROIDS};
+use thinking_engine::signed_engine::SignedThinkingEngine;
+use thinking_engine::pooling::Pooling;
+
+fn main() {
+    println!("═══════════════════════════════════════════════════════════");
+    println!("  END-TO-END: real tokenizer → i8 signed → nucleus");
+    println!("═══════════════════════════════════════════════════════════\n");
+
+    // Load real XLM-RoBERTa tokenizer
+    let tok = match tokenizers::Tokenizer::from_file(
+        "crates/thinking-engine/data/jina-v3-hdr/tokenizer.json"
+    ) {
+        Ok(t) => t,
+        Err(e) => { eprintln!("Tokenizer failed: {}. Aborting.", e); return; }
+    };
+    println!("Tokenizer: XLM-RoBERTa 250K loaded\n");
+
+    // Build signed engine from Jina HDR table
+    let signed_table: Vec<i8> = JINA_HDR_TABLE.iter()
+        .map(|&v| (v as i16 - 128) as i8)
+        .collect();
+    // NOTE: This is from_unsigned (CDF rank relabeling, not true signed).
+    // The real i8 path needs from_f32_cosines via stream_signed_lens.
+    // But this tests the ENGINE + POOLING pipeline, not the encoding quality.
+    let mut engine = SignedThinkingEngine::new(signed_table);
+
+    let pooling = Pooling::Nucleus {
+        temperature: 0.7,
+        top_p: 0.9,
+        seed: Some(42), // deterministic for comparison
+    };
+
+    // Calibration pairs (4 tiers)
+    let pairs: Vec<(&str, &str, &str)> = vec![
+        // TIER 1 — should be MOST similar
+        ("The wound is the place where the light enters you",
+         "Where there is ruin there is hope for a treasure",
+         "Rumi↔Rumi"),
+        ("A federal judge ruled the surveillance program unconstitutional",
+         "A US court declared the mass surveillance scheme violated the constitution",
+         "STS-B paraphrase"),
+        // TIER 2 — moderate
+        ("Palantir built Gotham for intelligence agencies to map human networks",
+         "Edward Snowden revealed the NSA collected phone metadata of millions",
+         "Palantir↔Snowden"),
+        ("Amyloid plaques accumulate in the brains of Alzheimer patients",
+         "Tau protein tangles disrupt neural communication in neurodegenerative disease",
+         "Alzheimer↔Tau"),
+        // TIER 3 — weak
+        ("Newton showed that gravity follows an inverse square law",
+         "Quantum entanglement allows particles to share states across arbitrary distances",
+         "Newton↔Quantum"),
+        // TIER 4 — should be LEAST similar
+        ("You are not a drop in the ocean you are the entire ocean in a drop",
+         "TCP uses a three-way handshake to establish a reliable connection between hosts",
+         "Rumi↔TCP"),
+        ("CRISPR-Cas9 enables precise editing of genomic sequences at targeted loci",
+         "Bach composed the Well-Tempered Clavier as an exploration of all major and minor keys",
+         "CRISPR↔Bach"),
+    ];
+
+    println!("  {:>20}  {:>8}  {:>8}  {:>8}  {:>6}  {:>6}",
+        "Pair", "Jaccard", "Cos(E)", "TopK∩", "Inhib", "Cycles");
+    println!("  {:─>20}  {:─>8}  {:─>8}  {:─>8}  {:─>6}  {:─>6}", "", "", "", "", "", "");
+
+    let mut results: Vec<(String, f32, f32, usize)> = Vec::new();
+
+    for (text_a, text_b, label) in &pairs {
+        let enc_a = tok.encode(*text_a, true).unwrap();
+        let enc_b = tok.encode(*text_b, true).unwrap();
+        let ids_a: Vec<u32> = enc_a.get_ids().to_vec();
+        let ids_b: Vec<u32> = enc_b.get_ids().to_vec();
+
+        let centroids_a = jina_lookup_many(&ids_a);
+        let centroids_b = jina_lookup_many(&ids_b);
+
+        // Think text A — with temperature excitation (T=0.3, sharp discrimination)
+        engine.reset();
+        engine.perturb(&centroids_a);
+        engine.think_with_temperature(10, 0.3);
+        let energy_a = engine.energy.clone();
+        let pooled_a = pooling.pool(&energy_a);
+        let inhib_a = engine.total_inhibitions;
+
+        // Think text B
+        engine.reset();
+        engine.perturb(&centroids_b);
+        engine.think_with_temperature(10, 0.3);
+        let energy_b = engine.energy.clone();
+        let pooled_b = pooling.pool(&energy_b);
+        let inhib_b = engine.total_inhibitions;
+
+        // Compare: Jaccard of pooled atoms
+        let atoms_a: std::collections::HashSet<u16> = pooled_a.atoms.iter()
+            .map(|&(idx, _)| idx).collect();
+        let atoms_b: std::collections::HashSet<u16> = pooled_b.atoms.iter()
+            .map(|&(idx, _)| idx).collect();
+        let intersection = atoms_a.intersection(&atoms_b).count();
+        let union = atoms_a.union(&atoms_b).count().max(1);
+        let jaccard = intersection as f32 / union as f32;
+
+        // Compare: cosine of full energy vectors
+        let dot: f32 = energy_a.iter().zip(&energy_b).map(|(a, b)| a * b).sum();
+        let na: f32 = energy_a.iter().map(|x| x * x).sum::<f32>().sqrt();
+        let nb: f32 = energy_b.iter().map(|x| x * x).sum::<f32>().sqrt();
+        let cos_e = if na > 1e-10 && nb > 1e-10 { dot / (na * nb) } else { 0.0 };
+
+        // Compare: top-k overlap
+        let top_a: Vec<u16> = pooled_a.atoms.iter().take(5).map(|&(idx, _)| idx).collect();
+        let top_b: Vec<u16> = pooled_b.atoms.iter().take(5).map(|&(idx, _)| idx).collect();
+        let topk_overlap = top_a.iter().filter(|x| top_b.contains(x)).count();
+
+        println!("  {:>20}  {:>8.3}  {:>8.3}  {:>5}/5  {:>6}  {:>3}+{:<3}",
+            label, jaccard, cos_e, topk_overlap,
+            (inhib_a + inhib_b) / 2,
+            pooled_a.atoms.len(), pooled_b.atoms.len());
+
+        results.push((label.to_string(), jaccard, cos_e, topk_overlap));
+    }
+
+    // Verdict
+    println!("\n═══════════════════════════════════════════════════════════");
+    println!("  VERDICT");
+    println!("═══════════════════════════════════════════════════════════");
+
+    // Check monotonicity: tier 1 > tier 2 > tier 3 > tier 4
+    let tier1_avg = (results[0].2 + results[1].2) / 2.0;
+    let tier2_avg = (results[2].2 + results[3].2) / 2.0;
+    let tier3_avg = results[4].2;
+    let tier4_avg = (results[5].2 + results[6].2) / 2.0;
+
+    println!("  Tier 1 (paraphrase):  cos={:.3}", tier1_avg);
+    println!("  Tier 2 (thematic):    cos={:.3}", tier2_avg);
+    println!("  Tier 3 (weak):        cos={:.3}", tier3_avg);
+    println!("  Tier 4 (unrelated):   cos={:.3}", tier4_avg);
+    println!();
+
+    let monotonic = tier1_avg >= tier2_avg && tier2_avg >= tier3_avg && tier3_avg >= tier4_avg;
+    if monotonic {
+        println!("  → MONOTONIC: tiers decrease correctly. Engine discriminates.");
+        println!("  → Ready for 7-lane encoding + ONNX ICC calibration.");
+    } else if tier1_avg > tier4_avg {
+        println!("  → PARTIALLY DISCRIMINATIVE: tier1 > tier4 but not monotonic.");
+        println!("  → Engine sees some signal. May improve with better encoding.");
+    } else {
+        println!("  → NOT DISCRIMINATIVE: tier1 ≤ tier4. Engine is confused.");
+        println!("  → Fix encoding or table granularity before calibration.");
+    }
+}