Motivation
Native, vectorised, zero-copy execution path for Druid via FFM and Arrow — staged from segment reader → filter/project → DataFusion (follow-up to #19039 item 1)
Druid today does the entire query hot path on the JVM. Each row passes through a ColumnSelector interface, materialises into Java objects, gets compared/aggregated by JIT-compiled byte code - means :
Proposed changes
Phase 0 — Spike: native segment column reader via FFM
Phase 1 — Layer0 production-shape: native column reader for all common types
Phase 2 — L1: vectorised native filter + project over Arrow batches
Phase 3 — L2: DataFusion as candidate aggregate/join executor in MSQ stages
Rationale
Operational impact
Performance 2–3× speedup on
- speedup on cold-cache scan + filter for typical Druid datasources.
- speedup on MSQ fact-fact joins.
- speedup on external Parquet ingestion.
- lower p99 for high-cardinality GroupBy.
- faster execution under memory pressure (DataFusion graceful spill).
Operational
Test plan (optional)
Future work (optional)
Motivation
Native, vectorised, zero-copy execution path for Druid via FFM and Arrow — staged from segment reader → filter/project → DataFusion (follow-up to #19039 item 1)
Druid today does the entire query hot path on the JVM. Each row passes through a ColumnSelector interface, materialises into Java objects, gets compared/aggregated by JIT-compiled byte code - means :
Proposed changes
Phase 0 — Spike: native segment column reader via FFM
Phase 1 — Layer0 production-shape: native column reader for all common types
Phase 2 — L1: vectorised native filter + project over Arrow batches
Phase 3 — L2: DataFusion as candidate aggregate/join executor in MSQ stages
Rationale
Operational impact
Performance 2–3× speedup on
Operational
Test plan (optional)
Future work (optional)