Caliby Benchmarks

Comprehensive benchmarks comparing Caliby against other popular vector search libraries.

Available Benchmarks

1. Multi-Database Comparison (Caliby vs ChromaDB vs Qdrant vs Weaviate)

Full vector database comparison - Tests document storage, retrieval, vector search, filtered search, and hybrid search

Script: compare_vectordb.py
Documentation: CHROMADB_BENCHMARK.md
What's tested: Insertion throughput, document retrieval, vector search, filtered search, hybrid search, storage efficiency
Dataset: SIFT1M (1M vectors, 128 dimensions)
Databases: Caliby (your implementation), ChromaDB, Qdrant, Weaviate (all local/embedded)

# Install dependencies
pip install chromadb qdrant-client weaviate-client

# Quick start - Compare all databases
python compare_vectordb.py --num-vectors 100000

# Compare specific databases
python compare_vectordb.py --caliby-only --num-vectors 100000
python compare_vectordb.py --chromadb-only --num-vectors 100000
python compare_vectordb.py --qdrant-only --num-vectors 100000
python compare_vectordb.py --weaviate-only --num-vectors 100000

2. HNSW Comparison

Compare HNSW implementations across Caliby, Usearch, and Faiss

Script: compare_hnsw.py
Dataset: SIFT1M

3. DiskANN Comparison

Compare DiskANN implementations

Script: compare_diskann.py

4. IVF+PQ Comparison

Compare quantization-based indexes with FAISS

Script: compare_ivfpq_faiss.py

IVF+PQ Benchmark

FAISS IVF+PQ Baseline Results (SIFT1M)

Configuration:

Dataset: 1M SIFT vectors (128 dimensions)
Clusters: 256
Subquantizers: 8
PQ codes: 256 (8 bits)

Performance:

Build time: 9.98s (126k vectors/sec)
Index size: ~7.6 MB compressed
Recall@10 (nprobe=16): 35.6%
QPS (nprobe=16): 52,754
Latency (nprobe=16): 0.019 ms/query

Recall vs nprobe:

nprobe    Recall@10    QPS
1         0.262        447,140
4         0.340        164,726
16        0.356        52,754
32        0.356        28,023  
64        0.356        18,465
128       0.356        10,372

Run with: python3 faiss_ivfpq_baseline.py

Caliby IVF+PQ Status

✅ Fixed: Codebook overflow bug - can now correctly handle multi-page codebooks ⚠️ Known Issues: Search segfault with large datasets (>50k vectors)
📊 Early Results: ~40% recall@10 (better than FAISS 35.6%!)

HNSW Benchmark

Compare HNSW implementations across Caliby, Usearch, and Faiss using the SIFT1M dataset.

Requirements

# Install benchmark dependencies
pip install usearch faiss-cpu numpy

# Or install all at once
pip install usearch faiss-cpu numpy

Usage

Run the full benchmark:

python compare_hnsw.py

Customize parameters:

python compare_hnsw.py --M 32 --ef-construction 400 --ef-search 100

Skip specific libraries:

python compare_hnsw.py --skip usearch faiss  # Only benchmark caliby

Specify data directory:

python compare_hnsw.py --data-dir /path/to/sift1m

Command-line Options

--data-dir: Directory for SIFT1M dataset (default: ./sift1m)
--M: HNSW M parameter - controls connectivity (default: 16)
--ef-construction: Construction time search depth (default: 200)
--ef-search: Query time search depth (default: 50)
--skip: Libraries to skip (choices: caliby, usearch, faiss)

Metrics

The benchmark measures:

Build Time: Time to construct the index
Index Size: Disk/memory footprint in MB
QPS: Queries per second (throughput)
Latency: P50, P95, P99 percentiles in milliseconds
Recall@10: Accuracy compared to ground truth

Dataset

The benchmark uses SIFT1M:

1,000,000 base vectors (128 dimensions)
10,000 query vectors
Ground truth for Recall@k computation

The dataset is automatically downloaded on first run (~200MB compressed).

Example Output

==================================================================================================
BENCHMARK RESULTS SUMMARY
==================================================================================================

Library         Build(s)     Size(MB)     QPS          P50(ms)    P95(ms)    P99(ms)    Recall@10   
----------------------------------------------------------------------------------------------------
Usearch         45.23        345.12       8234.5       0.121      0.234      0.456      0.9823      
Faiss           52.18        389.45       7891.2       0.127      0.245      0.501      0.9845      
Caliby          48.76        356.78       7654.3       0.131      0.252      0.523      0.9801      

==================================================================================================

WINNERS:
  🏆 Highest QPS:        Usearch (8234.5 queries/sec)
  🏆 Best Recall@10:     Faiss (0.9845)
  🏆 Lowest P50 Latency: Usearch (0.121 ms)
  🏆 Smallest Index:     Usearch (345.12 MB)
  🏆 Fastest Build:      Usearch (45.23 s)

==================================================================================================

Interpreting Results

QPS: Higher is better - measures search throughput
Latency: Lower is better - measures per-query response time
Recall@10: Higher is better (max 1.0) - measures accuracy
Build Time: Lower is better - time to construct index
Index Size: Lower is better - storage efficiency

Notes

First run downloads SIFT1M dataset (~200MB)
Benchmark includes warm-up phase before measurements
All libraries use L2 distance metric
Results may vary based on hardware and configuration

Cohere Datasets

The benchmark now supports Cohere Wikipedia embedding datasets (768-dimensional vectors).

Available Datasets

cohere1m: 1 million vectors, 768 dimensions, 10K queries
cohere10m: 10 million vectors, 768 dimensions, 10K queries

Creating Datasets

Use the provided script to create synthetic Cohere-like datasets:

# Create Cohere-1M dataset (~3GB)
python3 create_cohere_datasets.py --dataset cohere1m

# Create Cohere-10M dataset (~30GB, takes 30-60 minutes)
python3 create_cohere_datasets.py --dataset cohere10m

# Create both
python3 create_cohere_datasets.py --dataset both

The datasets will be created at: ~/Workspace/datasets/cohere/

Running Benchmarks

# Benchmark with Cohere-1M
python3 compare_vectordb.py --dataset cohere1m --caliby-only

# Benchmark with Cohere-10M (subset)
python3 compare_vectordb.py --dataset cohere10m --caliby-only --num-vectors 100000

# Full Cohere-10M benchmark
python3 compare_vectordb.py --dataset cohere10m --caliby-only

Real Cohere Datasets

For real Cohere Wikipedia embeddings:

HuggingFace: https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings
35M+ Wikipedia articles with 768-dim embeddings from Cohere's embedding model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caliby Benchmarks

Available Benchmarks

1. Multi-Database Comparison (Caliby vs ChromaDB vs Qdrant vs Weaviate)

2. HNSW Comparison

3. DiskANN Comparison

4. IVF+PQ Comparison

IVF+PQ Benchmark

FAISS IVF+PQ Baseline Results (SIFT1M)

Caliby IVF+PQ Status

HNSW Benchmark

Requirements

Usage

Command-line Options

Metrics

Dataset

Example Output

Interpreting Results

Notes

Cohere Datasets

Available Datasets

Creating Datasets

Running Benchmarks

Real Cohere Datasets

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Caliby Benchmarks

Available Benchmarks

1. Multi-Database Comparison (Caliby vs ChromaDB vs Qdrant vs Weaviate)

2. HNSW Comparison

3. DiskANN Comparison

4. IVF+PQ Comparison

IVF+PQ Benchmark

FAISS IVF+PQ Baseline Results (SIFT1M)

Caliby IVF+PQ Status

HNSW Benchmark

Requirements

Usage

Command-line Options

Metrics

Dataset

Example Output

Interpreting Results

Notes

Cohere Datasets

Available Datasets

Creating Datasets

Running Benchmarks

Real Cohere Datasets