Skip to content

[Proposal] Native, vectorised, zero-copy execution path for Druid #19456

@Shekharrajak

Description

@Shekharrajak

Motivation

Native, vectorised, zero-copy execution path for Druid via FFM and Arrow — staged from segment reader → filter/project → DataFusion (follow-up to #19039 item 1)

Druid today does the entire query hot path on the JVM. Each row passes through a ColumnSelector interface, materialises into Java objects, gets compared/aggregated by JIT-compiled byte code - means :

Proposed changes

Phase 0 — Spike: native segment column reader via FFM
Phase 1 — Layer0 production-shape: native column reader for all common types
Phase 2 — L1: vectorised native filter + project over Arrow batches
Phase 3 — L2: DataFusion as candidate aggregate/join executor in MSQ stages

Rationale

Operational impact

Performance 2–3× speedup on

  • speedup on cold-cache scan + filter for typical Druid datasources.
  • speedup on MSQ fact-fact joins.
  • speedup on external Parquet ingestion.
  • lower p99 for high-cardinality GroupBy.
  • faster execution under memory pressure (DataFusion graceful spill).

Operational

Test plan (optional)

Future work (optional)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions