Skip to content

fix: remove tokenizer.json files from git (757K LOC → GitHub Release) Removed from git tracking: data/jina-v3-hdr/tokenizer.json (8.7 MB, XLM-RoBERTa 250K) data/bge-m3-hdr/tokenizer.json (8.7 MB, XLM-RoBERTa 250K) data/jina-v5-tokenizer.json (11.4 MB, Qwen3 151K — 757K lines!) data/xlm-roberta-de/tokenizer.json (8.7 MB, German NER) Files stay on disk (gitignored) for local development. tokenizer_registry.rs already has from_pretrained() fallback that downloads from HuggingFace if local file is missing. Upload to GitHub Release for offline environments. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A#120

Merged
AdaWorldAPI merged 12 commits into
mainfrom
claude/risc-thought-engine-TCZw7
Apr 6, 2026

Commits

Commits on Apr 5, 2026

Commits on Apr 6, 2026