Skip to content

feat: 6 EmbedAnything patterns — auto-detect, chunker, tensor bridge + practices#117

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/risc-thought-engine-TCZw7
Apr 5, 2026
Merged

feat: 6 EmbedAnything patterns — auto-detect, chunker, tensor bridge + practices#117
AdaWorldAPI merged 1 commit into
mainfrom
claude/risc-thought-engine-TCZw7

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

All 6 patterns from EmbedAnything translated to codebook-lookup paradigm:

auto_detect.rs (6 tests):
detect_from_config_json(): routes XLM-RoBERTa/Qwen/ModernBERT/BERT
detect_from_gguf_metadata(): same from GGUF KV pairs
DetectedModel: architecture, dims, vocab, MoE, recommended lens

semantic_chunker.rs (4 tests):
find_boundaries(): slide window, think, detect convergence-jump
chunk(): split centroid sequence at semantic boundaries
The ThinkingEngine IS the chunker. No forward pass for boundaries.

tensor_bridge.rs (7 tests):
EmbeddingOutput: F32 | I8 | U8 | Tensor (candle, feature-gated)
to_f32(), to_i8(), to_u8(): convert between any pair
cosine(): cross-type similarity (e.g. F32 vs U8)
EmbeddingBatch: pairwise cosine matrix

Previously implemented:
pooling.rs (6 tests): ArgMax/Mean/TopK/Weighted
builder.rs (7 tests): ThinkingEngineBuilder fluent API + commit sinks

.claude/CODING_PRACTICES.md: checklist + anti-patterns for new modules

218 tests pass. serde_json added as dependency (needed for config.json parsing).

https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

…+ practices

All 6 patterns from EmbedAnything translated to codebook-lookup paradigm:

auto_detect.rs (6 tests):
  detect_from_config_json(): routes XLM-RoBERTa/Qwen/ModernBERT/BERT
  detect_from_gguf_metadata(): same from GGUF KV pairs
  DetectedModel: architecture, dims, vocab, MoE, recommended lens

semantic_chunker.rs (4 tests):
  find_boundaries(): slide window, think, detect convergence-jump
  chunk(): split centroid sequence at semantic boundaries
  The ThinkingEngine IS the chunker. No forward pass for boundaries.

tensor_bridge.rs (7 tests):
  EmbeddingOutput: F32 | I8 | U8 | Tensor (candle, feature-gated)
  to_f32(), to_i8(), to_u8(): convert between any pair
  cosine(): cross-type similarity (e.g. F32 vs U8)
  EmbeddingBatch: pairwise cosine matrix

Previously implemented:
  pooling.rs (6 tests): ArgMax/Mean/TopK/Weighted
  builder.rs (7 tests): ThinkingEngineBuilder fluent API + commit sinks

.claude/CODING_PRACTICES.md: checklist + anti-patterns for new modules

218 tests pass. serde_json added as dependency (needed for config.json parsing).

https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants