feat: clustered segments pt.1 by clintropolis · Pull Request #19460 · apache/druid

clintropolis · 2026-05-13T18:10:49Z

Description

This PR builds on the foundations laid by projections (#17214) and the v10 segment format (#18880) to introduce 'clustered' segments, to give operators the option to push a CLUSTERED BY clause inside of a segment, as a companion to partitioning data is distributed between segments in this manner. Internally, the 'base' table is decmposed into separate cluster groups, which are combined together to form the 'complete' view of all rows stored in the segment via concatenation. This optimizes for use cases where the typical queries are filtering down to a small subset of the cluster groups (ideally a single grouping), which like the effect from using aggregate projections, can greatly reduce the number of rows to be scanned. The expected use cases are things like multi-tenant-with-shared datasource clusters, metrics use cases which typically filter to a single type of service, etc

This PR contains only the read side, to get feedback on the internal segment metadata shapes and query engine integration. The write side (ingestion support) will come in a follow-up PR, so this PR uses some test fixtures to exercise the read paths until segment building is actually in place.

Since this is an experimental/mostly new feature, the most important part for reviewers is the new internal segment metadata, ClusteredValueGroupsBaseTableSchema and its internal new stuff like TableClusterGroupSpec and ClusteringDictionaries, so that we can ensure the metadata we will be storing in the segment is "good" since changing it after segments have been written is very hard.

todo: elaborate on design

+  @Override
+  public List<String> getColumnNames()
+  {
+    List<String> columns = new ArrayList<>(columnNames.size() + aggregators.length);


    };
  }

+  private CursorHolder makeClusteredCursorHolder(CursorBuildSpec spec, ClusteredValueGroupsBaseTableSchema clusterSummary)


+      }
+      final Object raw = parent.currentValue(idx);
+      final String stringValue = raw == null ? null : String.valueOf(raw);
+      cachedSelector = DimensionSelector.constant(stringValue, spec.getExtractionFn());


+      final Object raw = parent.currentValue(idx);
+      final String stringValue = raw == null ? null : String.valueOf(raw);
+      final String afterExtraction =
+          spec.getExtractionFn() == null ? stringValue : spec.getExtractionFn().apply(stringValue);


+      final Object raw = parent.currentValue(idx);
+      final String stringValue = raw == null ? null : String.valueOf(raw);
+      final String afterExtraction =
+          spec.getExtractionFn() == null ? stringValue : spec.getExtractionFn().apply(stringValue);


FrankChen021

Severity	Findings
P0	0
P1	2
P2	0
P3	0
Total	2

Reviewed 25 of 25 changed files.

This is an automated review by Codex GPT-5.5

FrankChen021 · 2026-05-14T12:21:53Z

+      );
+    }
+
+    return new QueryableIndexCursorHolder(


[P1] Cluster-column filters are reapplied against sub-indexes that do not contain them

After pruneClusterGroups keeps a matching group, the original CursorBuildSpec is passed unchanged into a QueryableIndexCursorHolder for the group sub-index. That sub-index intentionally does not contain the clustering columns, so EqualityFilter.getBitmapColumnIndex sees a missing column and builds an exact all-false/all-unknown bitmap before the ClusteringColumnSelectorFactory wrapper can provide the constant value to a matcher. Queries such as tenant = 'acme' therefore prune to the acme group and then scan zero rows. The multi-group supplier path repeats the same pattern, so clustering predicates need to be stripped/residualized after pruning or the group holder needs a cluster-aware ColumnIndexSelector/filter-bundle path.

FrankChen021 · 2026-05-14T12:21:53Z

+      if (clusteringValues[idx] == null) {
+        return DruidPredicateMatch.FALSE;
+      }
+      return DruidPredicateMatch.of(Objects.equals(clusteringValues[idx], eq.getMatchValue()));


[P1] Pruning ignores EqualityFilter type coercion semantics

The pruner compares the stored clustering value directly with eq.getMatchValue(), but EqualityFilter normal matcher casts the literal to the input column type. For example, a LONG clustering value 5 with an EqualityFilter typed as STRING value "5" would match when evaluated by the real matcher, but Objects.equals(5L, "5") returns false here and the only live cluster group is removed before scanning. TypedInFilter has the same risk below with direct Objects.equals over the raw sorted values. The pruning check should use the same predicate/cast semantics as the filter matcher, or stay UNKNOWN when the literal type does not exactly match the clustering column type.

feat: clustered segments pt.1

fa99556

github-actions Bot added the Area - Segment Format and Ser/De label May 13, 2026

github-advanced-security AI found potential problems May 13, 2026

View reviewed changes

clintropolis mentioned this pull request May 13, 2026

feat: clustered segments DataSegment changes and PartialLoadMatcher #19462

Merged

FrankChen021 reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: clustered segments pt.1#19460

feat: clustered segments pt.1#19460
clintropolis wants to merge 1 commit into
apache:masterfrom
clintropolis:clustered-segments

clintropolis commented May 13, 2026

Uh oh!

FrankChen021 left a comment

Uh oh!

FrankChen021 May 14, 2026

Uh oh!

FrankChen021 May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

clintropolis commented May 13, 2026

Description

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

FrankChen021 May 14, 2026

Choose a reason for hiding this comment

Uh oh!

FrankChen021 May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants