On-device RAG for Swift. Documents, embeddings, BM25 and HNSW indexes in a single file.
Quick Start β’ Performance β’ How It Works β’ Install
import Wax
// Create a memory file
let brain = try await MemoryOrchestrator(
at: URL(fileURLWithPath: "brain.mv2s")
)
// Remember something
try await brain.remember(
"User prefers dark mode and gets headaches from bright screens",
metadata: ["source": "onboarding"]
)
// Recall with RAG
let context = try await brain.recall(query: "user preferences")
// β "User prefers dark mode and gets headaches from bright screens"
// + relevant context, ranked and token-budgetedNo Docker. No network calls.
Adding memory to an iOS or macOS app typically means standing up a vector database, a text search index, and a persistence layer β three services with separate setup, uptime dependencies, and potential data egress.
Wax stores all of it in a single .mv2s file on the user's device.
Traditional RAG Stack: Wax:
βββββββββββββββ βββββββββββββββ
β Your App β β Your App β
βββββββββββββββ€ βββββββββββββββ€
β ChromaDB β β β
β PostgreSQL β vs. β brain. β
β Redis β β mv2s β
β Elasticsearchβ β β
β Docker β β β
βββββββββββββββ βββββββββββββββ
~5 services 1 file
| Fast | 0.84ms vector search @ 10K docs (Metal GPU) |
| Durable | Kill -9 safe, power-loss safe, tested |
| Deterministic | Same query = same context, every time |
| Portable | One .mv2s file β move it, backup it, ship it |
| Private | 100% on-device. Zero network calls. |
Apple Silicon (M1 Pro)
Vector Search Latency (10K Γ 384-dim)
ββββββββββββββββββββββββββββββββββββββββββββ
Wax Metal (warm) ββββββββββββββββββββ 0.84ms
Wax Metal (cold) ββββββββββββββββββββ 9.2ms
Wax CPU ββββββββββββββββββββ 105ms
SQLite FTS5 ββββββββββββββββββββ 150ms
ββββββββββββββββββββββββββββββββββββββββββββ
Cold Open β First Query: 17ms
Hybrid Search @ 10K docs: 105ms
These are reproducible XCTest benchmark baselines captured from the current Wax benchmark harness.
| Workload | Time | Throughput |
|---|---|---|
| smoke (200 docs) | 0.103s |
~1941.7 docs/s |
| standard (1000 docs) | 0.309s |
~3236.2 docs/s |
| stress (5000 docs) | 2.864s |
~1745.8 docs/s |
| 10k | 7.756s |
~1289.3 docs/s |
| Workload | Time | Throughput |
|---|---|---|
| warm CPU smoke | 0.0015s |
~666.7 ops/s |
| warm CPU standard | 0.0033s |
~303.0 ops/s |
| warm CPU stress | 0.0072s |
~138.9 ops/s |
| 10k CPU hybrid iteration | 0.103s |
~9.7 ops/s |
| Workload | Time |
|---|---|
| smoke | 0.103s |
| standard | 0.101s |
Stress recall is currently harness-blocked (signal 11) and treated as a known benchmark issue.
| Mode | Time |
|---|---|
| fast mode | 0.102s |
| dense cached | 0.102s |
For benchmark commands, profiling traces, and methodology, see:
Tasks/hot-path-specialization-investigation.md
Wax includes a WAL/storage health track focused on commit latency tails, long-run file growth, and recovery behavior:
- No-op index compaction guards to avoid unnecessary index rewrites.
- Single-pass WAL replay with guarded replay snapshot fast path.
- Proactive WAL-pressure commits for targeted workloads (guarded rollout).
- Scheduled
rewriteLiveSetmaintenance with dead-payload thresholds, validation, and rollback.
- Repeated unchanged index compaction growth improved from
+61,768,464bytes over 8 runs (~7.72MB/run) to bounded drift (test-gated). - Commit latency improved in most matrix workloads in recent runs (examples:
medium_hybridp95-13.9%,large_text_10kp95-8.0%,sustained_write_textp95-5.7%). - Reopen/recovery p95 is generally flat-to-improved across the matrix.
sustained_write_hybridremains workload-sensitive, so proactive/scheduled maintenance stays guarded by default.
- Proactive pressure commits are tuned for targeted workloads and validated with percentile guardrails.
- Replay snapshot open-path optimization is additive and guarded.
- Scheduled live-set rewrite is configurable and runs deferred from the
flush()hot path. - Rewrite candidates are automatically validated and rolled back on verification failure.
import Wax
var config = OrchestratorConfig.default
config.liveSetRewriteSchedule = LiveSetRewriteSchedule(
enabled: true,
checkEveryFlushes: 32,
minDeadPayloadBytes: 64 * 1024 * 1024,
minDeadPayloadFraction: 0.25,
minimumCompactionGainBytes: 0,
minimumIdleMs: 15_000,
minIntervalMs: 5 * 60_000,
verifyDeep: false
)WAX_BENCHMARK_WAL_COMPACTION=1 \
WAX_BENCHMARK_WAL_OUTPUT=/tmp/wal-matrix.json \
swift test --filter WALCompactionBenchmarks.testWALCompactionWorkloadMatrixWAX_BENCHMARK_WAL_GUARDRAILS=1 \
swift test --filter WALCompactionBenchmarks.testProactivePressureCommitGuardrailsWAX_BENCHMARK_WAL_REOPEN_GUARDRAILS=1 \
swift test --filter WALCompactionBenchmarks.testReplayStateSnapshotGuardrailsSee Tasks/wal-compaction-investigation.md in the repo for methodology and baseline artifacts.
.package(url: "https://github.com/christopherkarani/Wax.git", from: "0.1.6")Text Memory β Documents, notes, conversations
import Wax
let orchestrator = try await MemoryOrchestrator(at: storeURL)
// Ingest
try await orchestrator.remember(documentText, metadata: ["source": "report.pdf"])
// Recall
let context = try await orchestrator.recall(query: "key findings")
for item in context.items {
print("[\(item.kind)] \(item.text)")
}Photo Memory β Photo library with OCR + CLIP embeddings
import Wax
let photoRAG = try await PhotoRAGOrchestrator(
storeURL: storeURL,
config: .default,
embedder: MyCLIPEmbedder() // Your CoreML model
)
// Index local photos (offline-only)
try await photoRAG.syncLibrary(scope: .fullLibrary)
// Search
let ctx = try await photoRAG.recall(.init(text: "Costco receipt"))Video Memory β Video segments with transcripts
import Wax
let videoRAG = try await VideoRAGOrchestrator(
storeURL: storeURL,
config: .default,
embedder: MyEmbedder(),
transcriptProvider: MyTranscriber()
)
// Ingest
try await videoRAG.ingest(files: [videoFile])
// Search by content or transcript
let ctx = try await videoRAG.recall(.init(text: "project timeline discussion"))Wax packs everything into a single .mv2s file β the equivalent of SQLite for AI memory: one file that contains your documents, the search indexes, and enough crash-recovery state to survive a kill signal.
The file contains:
- Raw documents
- Embeddings (any dimension, any provider)
- BM25 full-text search index (FTS5)
- HNSW vector index (USearch)
- Write-ahead log for crash recovery
- Metadata and entity graph
The file format:
- Append-only β Fast writes, no fragmentation
- Checksum-verified β Every byte validated
- Dual-header β Atomic updates, never corrupt
- Self-contained β No external dependencies
βββββββββββββββββββββββββββββββββββββββββββ
β Header Page A (4KB) β
β Header Page B (4KB) β atomic switch β
βββββββββββββββββββββββββββββββββββββββββββ€
β WAL Ring Buffer β
β (crash recovery log) β
βββββββββββββββββββββββββββββββββββββββββββ€
β Document Payloads (compressed) β
β Embeddings β
βββββββββββββββββββββββββββββββββββββββββββ€
β TOC (Table of Contents) β
β Footer + Checksum β
βββββββββββββββββββββββββββββββββββββββββββ
| Feature | Wax | Chroma | Core Data + FAISS | Pinecone |
|---|---|---|---|---|
| Single file | β | β | β | β |
| Works offline | β | β | β | |
| Crash-safe | β | β | N/A | |
| GPU vector search | β | β | β | β |
| No server required | β | β | β | β |
| Swift-native | β | β | β | β |
| Deterministic RAG | β | β | β | β |
Query-Adaptive Hybrid Search
Wax runs multiple search lanes in parallel β BM25, vector, temporal, structured evidence β and fuses results based on query type.
"When was my last dentist appointment?" β boosts temporal + structured "Explain quantum computing" β boosts vector + BM25
Tiered Memory Compression (Surrogates)
Wax generates hierarchical summaries for each document:
fullβ Complete document (for deep dives)gistβ Key paragraphs (for balanced recall)microβ One-liner (for quick context)
At query time, it picks the right tier based on query signals and remaining token budget.
Deterministic Token Budgeting
Strict cl100k_base token counting. Same query produces the same context window, every time β reproducible enough to benchmark and regression-test.
- Swift 6.2
- iOS 26 / macOS 26
- Apple Silicon (for Metal GPU features)
git clone https://github.com/christopherkarani/Wax.git
cd Wax
swift testMiniLM CoreML tests are opt-in:
WAX_TEST_MINILM=1 swift testSift is a semantic git history search CLI built on Wax. It indexes commit history locally and lets you search with natural language instead of git log --grep.
- Repo:
https://github.com/christopherkarani/Sift
brew tap christopherkarani/sift
brew install wax
wax tui
wax when did we add notifications featureBuilt by Christopher Karani
