Skip to content

Commit 162081f

Browse files
committed
bump versions, update workflows, update examples
1 parent fea9f51 commit 162081f

File tree

13 files changed

+291
-83
lines changed

13 files changed

+291
-83
lines changed

.github/workflows/e2e_tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ on:
88

99
env:
1010
CARGO_TERM_COLOR: always
11+
WALRUS_QUIET: "1"
1112

1213
jobs:
1314
e2e:

.github/workflows/integration_tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ on:
88

99
env:
1010
CARGO_TERM_COLOR: always
11+
WALRUS_QUIET: "1"
1112

1213
jobs:
1314
integration:

.github/workflows/lint.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ on:
88

99
env:
1010
CARGO_TERM_COLOR: always
11+
WALRUS_QUIET: "1"
1112

1213
jobs:
1314
fmt:

.github/workflows/regression_tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ on:
88

99
env:
1010
CARGO_TERM_COLOR: always
11+
WALRUS_QUIET: "1"
1112

1213
jobs:
1314
regression:

.github/workflows/unit_tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ on:
88

99
env:
1010
CARGO_TERM_COLOR: always
11+
WALRUS_QUIET: "1"
1112

1213
jobs:
1314
test:

CHANGELOG.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,53 @@
22

33
All notable changes to SatoriDB will be documented in this file.
44

5+
## [0.1.1] - 2025-12-23
6+
7+
### Breaking Changes
8+
9+
Complete public API redesign. The old `SatoriHandle` + `SatoriDbConfig` API is gone.
10+
11+
### New API
12+
13+
**Simple open:**
14+
```rust
15+
let db = SatoriDb::open("my_app")?;
16+
```
17+
18+
**Builder for configuration:**
19+
```rust
20+
let db = SatoriDb::builder("my_app")
21+
.workers(4)
22+
.fsync_ms(100)
23+
.data_dir("/custom/path")
24+
.virtual_nodes(8)
25+
.build()?;
26+
```
27+
28+
**Core operations:**
29+
- `db.insert(id, vector)` — insert (rejects duplicates)
30+
- `db.delete(id)` — delete by ID
31+
- `db.query(vector, top_k)` — nearest neighbor search
32+
- `db.query_with_probes(vector, top_k, probes)` — query with custom probe count
33+
- `db.query_with_vectors(vector, top_k)` — query returning stored vectors
34+
- `db.get(ids)` — fetch vectors by ID
35+
- `db.stats()` — database statistics
36+
- `db.flush()` — flush pending writes
37+
38+
**Async variants:** `insert_async`, `delete_async`, `query_async`, `get_async`
39+
40+
### Improvements
41+
42+
- **Auto-shutdown on Drop** — no manual shutdown required
43+
- **Duplicate ID rejection** — bloom filter optimization for fast duplicate detection
44+
- **Walrus logs suppressed by default** — set `WALRUS_QUIET=0` to enable
45+
46+
### Removed
47+
48+
- `SatoriDbConfig` struct
49+
- `SatoriHandle` direct access
50+
- `upsert_blocking`, `query_blocking` naming
51+
552
## [0.1.0] - 2025-12-21
653

754
Initial release.

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "satoridb"
3-
version = "0.1.0"
3+
version = "0.1.1"
44
edition = "2021"
55
description = "Embedded vector database for approximate nearest neighbor search (experimental)."
66
readme = "README.md"

README.md

Lines changed: 19 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -11,33 +11,23 @@
1111

1212
![architecture](./assets/architecture.png)
1313

14-
SatoriDB runs entirely in-process on a single node. It uses a **two-tier search** architecture:
14+
SatoriDB is a two-tier search system: a small "hot" index in RAM routes queries to "cold" vector data on disk. This lets us handle billion-scale datasets without holding everything in memory.
1515

16-
1. **Routing (HNSW)**: A quantized HNSW index over bucket centroids finds the most relevant clusters in O(log N)
17-
2. **Scanning (Workers)**: CPU-pinned Glommio executors scan selected buckets in parallel using SIMD-accelerated L2 distance
16+
**Routing (Hot Tier)**
1817

19-
We use a variant of [SPFresh](https://arxiv.org/pdf/2410.14452), Vectors are organized into **buckets** (clusters of similar vectors). A background rebalancer automatically splits buckets via k-means when they exceed a threshold, keeping search efficient as data grows. All data is persisted to walrus high performance storage engine and we use RocksDB indexes for point lookups.
18+
Quantized HNSW index over bucket centroids. Centroids are scalar-quantized (f32 → u8) so the whole routing index fits in RAM even at 500k+ buckets. When a query comes in, HNSW finds the top-K most relevant buckets in O(log N). We only search those, not the entire dataset.
2019

21-
See [docs/architecture.md](docs/architecture.md) for detailed documentation including:
22-
- System overview and component diagrams
23-
- Two-tier search architecture
24-
- Storage layer (Walrus + RocksDB)
25-
- Rebalancer and clustering algorithms
26-
- Data flow diagrams
20+
**Scanning (Cold Tier)**
2721

28-
```
29-
SatoriHandle ──▶ Router Manager ──▶ HNSW Index (centroids)
30-
│ │
31-
│ ┌─────────────────────────┘
32-
│ ▼
33-
│ Bucket IDs ──▶ Consistent Hash Ring
34-
│ │
35-
▼ ▼
36-
Workers ◀──────────────── bucket_id → shard
37-
38-
39-
Walrus (storage) + RocksDB (indexes)
40-
```
22+
CPU-pinned Glommio workers scan selected buckets in parallel. Shared-nothing: each worker has its own io_uring ring, LRU cache, and pre-allocated heap. No cross-core synchronization on the query path. SIMD everywhere: L2 distance, dot products, quantization, k-means assignments all have AVX2/AVX-512 paths. Cache misses stream from disk without blocking.
23+
24+
**Clustering & Rebalancing**
25+
26+
Vectors are grouped into buckets (clusters) via k-means. A background rebalancer automatically splits buckets when they exceed ~2000 vectors, keeping bucket sizes predictable. Predictable sizes = predictable query latency. Inspired by [SPFresh](https://arxiv.org/pdf/2410.14452).
27+
28+
**Storage**
29+
30+
Walrus handles bulk vector storage (append-only, io_uring, topic-per-bucket). RocksDB indexes handle point lookups (fetch-by-id, duplicate detection). See [docs/architecture.md](docs/architecture.md) for the full deep-dive.
4131

4232
## Features
4333

@@ -63,7 +53,11 @@ cargo add satoridb
6353
use satoridb::SatoriDb;
6454

6555
fn main() -> anyhow::Result<()> {
66-
let db = SatoriDb::open("my_app")?;
56+
let db = SatoriDb::builder("my_app")
57+
.workers(4) // Worker threads (default: num_cpus)
58+
.fsync_ms(100) // Fsync interval (default: 200ms)
59+
.data_dir("/tmp/mydb") // Data directory
60+
.build()?;
6761

6862
db.insert(1, vec![0.1, 0.2, 0.3])?;
6963
db.insert(2, vec![0.2, 0.3, 0.4])?;
@@ -150,4 +144,4 @@ cargo build --release
150144

151145
See [LICENSE](LICENSE).
152146

153-
> **Note**: SatoriDB is in early development (v0.1.0). APIs may change between versions. See [CHANGELOG.md](CHANGELOG.md) for release notes.
147+
> **Note**: SatoriDB is in early development (v0.1.1). APIs may change between versions. See [CHANGELOG.md](CHANGELOG.md) for release notes.

0 commit comments

Comments
 (0)