Wax

On-device RAG for Swift. Documents, embeddings, BM25 and HNSW indexes in a single file.

Quick Start • Performance • How It Works • Install

30-Second Demo

import Wax

// Create a memory file
let brain = try await MemoryOrchestrator(
    at: URL(fileURLWithPath: "brain.mv2s")
)

// Remember something
try await brain.remember(
    "User prefers dark mode and gets headaches from bright screens",
    metadata: ["source": "onboarding"]
)

// Recall with RAG
let context = try await brain.recall(query: "user preferences")
// → "User prefers dark mode and gets headaches from bright screens"
//   + relevant context, ranked and token-budgeted

No Docker. No network calls.

The Problem

Adding memory to an iOS or macOS app typically means standing up a vector database, a text search index, and a persistence layer — three services with separate setup, uptime dependencies, and potential data egress.

Wax stores all of it in a single .mv2s file on the user's device.

Traditional RAG Stack:                     Wax:
┌─────────────┐                           ┌─────────────┐
│  Your App   │                           │  Your App   │
├─────────────┤                           ├─────────────┤
│  ChromaDB   │                           │             │
│  PostgreSQL │        vs.                │   brain.    │
│  Redis      │                           │    mv2s     │
│  Elasticsearch│                         │             │
│  Docker     │                           │             │
└─────────────┘                           └─────────────┘
     ~5 services                              1 file

Why Wax?


Fast	0.84ms vector search @ 10K docs (Metal GPU)
Durable	Kill -9 safe, power-loss safe, tested
Deterministic	Same query = same context, every time
Portable	One `.mv2s` file — move it, backup it, ship it
Private	100% on-device. Zero network calls.

Performance

Apple Silicon (M1 Pro)

Vector Search Latency (10K × 384-dim)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Wax Metal (warm)     ████░░░░░░░░░░░░░░░░  0.84ms
Wax Metal (cold)     █████████████████░░░  9.2ms
Wax CPU              ███████████░░░░░░░░░  105ms
SQLite FTS5          ██████████████████░░  150ms
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Cold Open → First Query: 17ms
Hybrid Search @ 10K docs: 105ms

Core Benchmark Baselines (as of February 17, 2026)

These are reproducible XCTest benchmark baselines captured from the current Wax benchmark harness.

Ingest throughput (`testIngestHybridBatchedPerformance`)

Workload	Time	Throughput
smoke (200 docs)	`0.103s`	`~1941.7 docs/s`
standard (1000 docs)	`0.309s`	`~3236.2 docs/s`
stress (5000 docs)	`2.864s`	`~1745.8 docs/s`
10k	`7.756s`	`~1289.3 docs/s`

Search latency

Workload	Time	Throughput
warm CPU smoke	`0.0015s`	`~666.7 ops/s`
warm CPU standard	`0.0033s`	`~303.0 ops/s`
warm CPU stress	`0.0072s`	`~138.9 ops/s`
10k CPU hybrid iteration	`0.103s`	`~9.7 ops/s`

Recall latency (`testMemoryOrchestratorRecallPerformance`)

Workload	Time
smoke	`0.103s`
standard	`0.101s`

Stress recall is currently harness-blocked (signal 11) and treated as a known benchmark issue.

FastRAG builder

Mode	Time
fast mode	`0.102s`
dense cached	`0.102s`

For benchmark commands, profiling traces, and methodology, see:

Tasks/hot-path-specialization-investigation.md

WAL Compaction and Storage Health (2026-02)

Wax includes a WAL/storage health track focused on commit latency tails, long-run file growth, and recovery behavior:

No-op index compaction guards to avoid unnecessary index rewrites.
Single-pass WAL replay with guarded replay snapshot fast path.
Proactive WAL-pressure commits for targeted workloads (guarded rollout).
Scheduled rewriteLiveSet maintenance with dead-payload thresholds, validation, and rollback.

Measured outcomes

Repeated unchanged index compaction growth improved from +61,768,464 bytes over 8 runs (~7.72MB/run) to bounded drift (test-gated).
Commit latency improved in most matrix workloads in recent runs (examples: medium_hybrid p95 -13.9%, large_text_10k p95 -8.0%, sustained_write_text p95 -5.7%).
Reopen/recovery p95 is generally flat-to-improved across the matrix.
sustained_write_hybrid remains workload-sensitive, so proactive/scheduled maintenance stays guarded by default.

Safe rollout defaults

Proactive pressure commits are tuned for targeted workloads and validated with percentile guardrails.
Replay snapshot open-path optimization is additive and guarded.
Scheduled live-set rewrite is configurable and runs deferred from the flush() hot path.
Rewrite candidates are automatically validated and rolled back on verification failure.

Configure scheduled live-set rewrite

import Wax

var config = OrchestratorConfig.default
config.liveSetRewriteSchedule = LiveSetRewriteSchedule(
    enabled: true,
    checkEveryFlushes: 32,
    minDeadPayloadBytes: 64 * 1024 * 1024,
    minDeadPayloadFraction: 0.25,
    minimumCompactionGainBytes: 0,
    minimumIdleMs: 15_000,
    minIntervalMs: 5 * 60_000,
    verifyDeep: false
)

Reproduce benchmark matrix

WAX_BENCHMARK_WAL_COMPACTION=1 \
WAX_BENCHMARK_WAL_OUTPUT=/tmp/wal-matrix.json \
swift test --filter WALCompactionBenchmarks.testWALCompactionWorkloadMatrix

WAX_BENCHMARK_WAL_GUARDRAILS=1 \
swift test --filter WALCompactionBenchmarks.testProactivePressureCommitGuardrails

WAX_BENCHMARK_WAL_REOPEN_GUARDRAILS=1 \
swift test --filter WALCompactionBenchmarks.testReplayStateSnapshotGuardrails

See Tasks/wal-compaction-investigation.md in the repo for methodology and baseline artifacts.

Quick Start

1. Add to Package.swift

.package(url: "https://github.com/christopherkarani/Wax.git", from: "0.1.6")

2. Choose Your Memory Type

Text Memory — Documents, notes, conversations

import Wax

let orchestrator = try await MemoryOrchestrator(at: storeURL)

// Ingest
try await orchestrator.remember(documentText, metadata: ["source": "report.pdf"])

// Recall
let context = try await orchestrator.recall(query: "key findings")
for item in context.items {
    print("[\(item.kind)] \(item.text)")
}

Photo Memory — Photo library with OCR + CLIP embeddings

import Wax

let photoRAG = try await PhotoRAGOrchestrator(
    storeURL: storeURL,
    config: .default,
    embedder: MyCLIPEmbedder()  // Your CoreML model
)

// Index local photos (offline-only)
try await photoRAG.syncLibrary(scope: .fullLibrary)

// Search
let ctx = try await photoRAG.recall(.init(text: "Costco receipt"))

Video Memory — Video segments with transcripts

import Wax

let videoRAG = try await VideoRAGOrchestrator(
    storeURL: storeURL,
    config: .default,
    embedder: MyEmbedder(),
    transcriptProvider: MyTranscriber()
)

// Ingest
try await videoRAG.ingest(files: [videoFile])

// Search by content or transcript
let ctx = try await videoRAG.recall(.init(text: "project timeline discussion"))

How It Works

Wax packs everything into a single .mv2s file — the equivalent of SQLite for AI memory: one file that contains your documents, the search indexes, and enough crash-recovery state to survive a kill signal.

The file contains:

Raw documents
Embeddings (any dimension, any provider)
BM25 full-text search index (FTS5)
HNSW vector index (USearch)
Write-ahead log for crash recovery
Metadata and entity graph

The file format:

Append-only — Fast writes, no fragmentation
Checksum-verified — Every byte validated
Dual-header — Atomic updates, never corrupt
Self-contained — No external dependencies

┌─────────────────────────────────────────┐
│  Header Page A (4KB)                    │
│  Header Page B (4KB) ← atomic switch    │
├─────────────────────────────────────────┤
│  WAL Ring Buffer                        │
│  (crash recovery log)                   │
├─────────────────────────────────────────┤
│  Document Payloads (compressed)         │
│  Embeddings                             │
├─────────────────────────────────────────┤
│  TOC (Table of Contents)                │
│  Footer + Checksum                      │
└─────────────────────────────────────────┘

Comparison

Feature	Wax	Chroma	Core Data + FAISS	Pinecone
Single file	✅	❌	❌	❌
Works offline	✅	⚠️	✅	❌
Crash-safe	✅	❌	⚠️	N/A
GPU vector search	✅	❌	❌	❌
No server required	✅	✅	✅	❌
Swift-native	✅	❌	✅	❌
Deterministic RAG	✅	❌	❌	❌

Features

Query-Adaptive Hybrid Search

Wax runs multiple search lanes in parallel — BM25, vector, temporal, structured evidence — and fuses results based on query type.

"When was my last dentist appointment?" → boosts temporal + structured "Explain quantum computing" → boosts vector + BM25

Tiered Memory Compression (Surrogates)

Wax generates hierarchical summaries for each document:

full — Complete document (for deep dives)
gist — Key paragraphs (for balanced recall)
micro — One-liner (for quick context)

At query time, it picks the right tier based on query signals and remaining token budget.

Deterministic Token Budgeting

Strict cl100k_base token counting. Same query produces the same context window, every time — reproducible enough to benchmark and regression-test.

Requirements

Swift 6.2
iOS 26 / macOS 26
Apple Silicon (for Metal GPU features)

Contributing

git clone https://github.com/christopherkarani/Wax.git
cd Wax
swift test

MiniLM CoreML tests are opt-in:

WAX_TEST_MINILM=1 swift test

Sift

Sift is a semantic git history search CLI built on Wax. It indexes commit history locally and lets you search with natural language instead of git log --grep.

Repo: https://github.com/christopherkarani/Sift

brew tap christopherkarani/sift
brew install wax

wax tui
wax when did we add notifications feature

Built by Christopher Karani

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.claude/agents		.claude/agents
.github/workflows		.github/workflows
Sources		Sources
Tests		Tests
WaxDemo		WaxDemo
docs		docs
npm/waxmcp		npm/waxmcp
scripts/quality		scripts/quality
skills/public/wax		skills/public/wax
~/.claude/plans		~/.claude/plans
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wax

30-Second Demo

The Problem

Why Wax?

Performance

Core Benchmark Baselines (as of February 17, 2026)

Ingest throughput (`testIngestHybridBatchedPerformance`)

Search latency

Recall latency (`testMemoryOrchestratorRecallPerformance`)

FastRAG builder

WAL Compaction and Storage Health (2026-02)

Measured outcomes

Safe rollout defaults

Configure scheduled live-set rewrite

Reproduce benchmark matrix

Quick Start

1. Add to Package.swift

2. Choose Your Memory Type

How It Works

Comparison

Features

Requirements

Contributing

Sift

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wax

30-Second Demo

The Problem

Why Wax?

Performance

Core Benchmark Baselines (as of February 17, 2026)

Ingest throughput (testIngestHybridBatchedPerformance)

Search latency

Recall latency (testMemoryOrchestratorRecallPerformance)

FastRAG builder

WAL Compaction and Storage Health (2026-02)

Measured outcomes

Safe rollout defaults

Configure scheduled live-set rewrite

Reproduce benchmark matrix

Quick Start

1. Add to Package.swift

2. Choose Your Memory Type

How It Works

Comparison

Features

Requirements

Contributing

Sift

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Ingest throughput (`testIngestHybridBatchedPerformance`)

Recall latency (`testMemoryOrchestratorRecallPerformance`)

Packages