SingleNodeTransformation

A robust, single-node SQL query engine that reads and writes Apache Iceberg tables, built in Rust on Apache DataFusion + apache/iceberg-rust, with MinIO (S3-compatible) object storage. Apache Arrow is the in-memory format.

See the design/roadmap in the approved plan for the full architecture. This README tracks the current state.

Stack

Layer	Choice
Language	Rust (Tokio async)
Query engine	Apache DataFusion (`52.2`, matching `iceberg-datafusion 0.9.1`)
Iceberg	`apache/iceberg-rust` 0.9.1 (`iceberg`, `iceberg-catalog-rest`, `iceberg-datafusion`)
Catalog	Iceberg REST catalog
Storage	MinIO (S3-compatible)
File format	Parquet

Workspace layout

crates/
  engine/      # config, catalog connectivity, SessionContext builder, CoW DML
  extensions/  # custom scalar/aggregate UDFs (replaces "Gandiva expressions")
  cli/         # `snt` binary
docker/        # local Postgres-backed Iceberg REST + MinIO stack
seeder/        # PyIceberg NYC-taxi data seeder

Quickstart

Start the local stack:

docker compose -f docker/docker-compose.yml up -d

Build and run the CLI (lists catalog namespaces/tables):
```
cargo run -p cli
```

Configuration comes from environment variables (see .env.example); defaults match the docker-compose stack.

Milestones

M0 — scaffold + connect to REST catalog (list namespaces/tables). ✅ done
M1 — read path via DataFusion (SELECT, projection + predicate pushdown). ✅ done
- Verified against nyc.taxi (NYC yellow-taxi data): exact count(*), group-by, aggregations, and EXPLAIN showing pushdown into IcebergTableScan.
- Caveat: the seed delete was applied as copy-on-write by PyIceberg (no delete files), so merge-on-read delete merging is not yet exercised — see M1-follow-up.
M2 — append writes (CREATE TABLE, INSERT INTO, DROP TABLE). ✅ done
- Verified: empty CREATE TABLE, three INSERT INTO ... SELECT appends → 3 real Iceberg snapshots (added-records 57/75/361) + 3 Parquet data files in MinIO; DROP removes the table.
- REST catalog now backed by Postgres (the demo's in-memory SQLite threw SQLITE_BUSY on concurrent commits) — durable across restarts.
- Limitation: CREATE TABLE AS SELECT (CTAS) is not supported by iceberg-datafusion 0.9.1 (register_table does not support tables with data); use CREATE TABLE + INSERT INTO instead.
M3 — extensions (custom UDFs). ✅ done
- extensions crate registers scalar (payment_label, miles_to_km) and aggregate (geo_mean) UDFs on the SessionContext; verified via SQL against nyc.taxi. This is the DataFusion-native replacement for "custom Gandiva expressions".
M4 — DML (DELETE/UPDATE). ✅ done
- The released iceberg-datafusion/iceberg-rust stack has no native DML and an append-only transaction API. So DELETE/UPDATE are implemented as copy-on-write in engine::dml: rewrite the table into a temp table (created with the source's exact Iceberg schema), then swap via catalog drop + rename.
- Verified on a test table: DELETE removed the right rows; UPDATE applied literal and expression assignments scoped by predicate, with row counts preserved.
- Limitations: swap is two catalog calls (not one atomic commit); snapshot history restarts; single-level namespaces; unpartitioned tables; MERGE unsupported (DataFusion 52 can't parse it). table_exists is avoided (the REST adapter rejects its HTTP HEAD with 400).
M5 — hardening (observability, commit-conflict retry, docs). ← next

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
crates		crates
docker		docker
seeder		seeder
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SingleNodeTransformation

Stack

Workspace layout

Quickstart

Milestones

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SingleNodeTransformation

Stack

Workspace layout

Quickstart

Milestones

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages