Skip to content

Optimize tupleElement(dictGet(..., tuple_of_attrs), N) into single-attribute dictGet#100186

Merged
alexey-milovidov merged 49 commits into
masterfrom
optimize-dictget-tuple-element
May 6, 2026
Merged

Optimize tupleElement(dictGet(..., tuple_of_attrs), N) into single-attribute dictGet#100186
alexey-milovidov merged 49 commits into
masterfrom
optimize-dictget-tuple-element

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented Mar 20, 2026

Copy link
Copy Markdown
Member

When dictGet or dictGetOrDefault is called with a tuple of attribute names and the result is immediately indexed with tupleElement (positional like .1 or named like .country), rewrite to fetch only the needed attribute. This avoids fetching unnecessary dictionary attributes.

Example:

-- Before optimization:
SELECT dictGet('dict', ('country', 'city', 'population'), id).1 FROM t
-- After optimization (visible in EXPLAIN QUERY TREE):
SELECT dictGet('dict', 'country', id) FROM t

Controlled by the optimize_dictget_tuple_element setting (default: true).

Closes #50167

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added a query optimization that rewrites tupleElement(dictGet('dict', ('a', 'b', 'c'), key), N) into dictGet('dict', 'a', key), avoiding fetching unnecessary dictionary attributes. Controlled by the optimize_dictget_tuple_element setting (enabled by default).

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

New setting optimize_dictget_tuple_element (Bool, default true): when enabled, rewrites dictGet/dictGetOrDefault calls that fetch a tuple of attributes and immediately index a single element, into a dictGet call that fetches only that single attribute. Supports both positional (.1, .2) and named (.country) access patterns. The setting is documented inline in src/Core/Settings.cpp (the canonical source for ClickHouse setting docs, auto-extracted into the docs site).

Version info

  • Merged into: 26.5.1.362

…ngle-attribute `dictGet`

When `dictGet` or `dictGetOrDefault` is called with a tuple of attribute names
and the result is immediately indexed with `tupleElement` (positional like `.1`
or named like `.country`), rewrite to fetch only the needed attribute.

This avoids fetching unnecessary dictionary attributes, improving performance
for queries like:
  `dictGet('dict', ('country', 'city', 'population'), key).1`
which becomes:
  `dictGet('dict', 'country', key)`

Controlled by the `optimize_dictget_tuple_element` setting (default: true).

Closes: #50167

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [251ad42]

Summary:


AI Review

Summary

This PR adds an analyzer optimization that rewrites tupleElement(dictGet(..., tuple_attrs), idx_or_name) into single-attribute dictGet / dictGetOrDefault when semantics are preserved, with a dedicated setting and compatibility entry. The implementation is careful about shared query-tree nodes, default-value side effects, and type-preservation bailouts, and the test coverage exercises the key safety paths. Verdict: approve.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label Mar 20, 2026
size_t element_index = *maybe_index;

/// Replace the tuple of attribute names with a single attribute name
dict_get_args[1] = std::make_shared<ConstantNode>(attribute_names[element_index]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ This mutates dict_get_args[1] before the optimization is guaranteed to succeed.

In the dictGetOrDefault branch below, we can return early when the default tuple is shorter than the selected element (element_index >= default_tuple.size()). In that case, the function exits with a partially rewritten tree: attribute list already changed to a single string, but tupleElement(...) wrapper is still present and dictGet is not re-resolved.

That can leave the query tree inconsistent and may change diagnostics/behavior for invalid queries.

Please defer mutation until all preconditions are validated (or roll back on failure). For example: compute new_attr_arg and new_default_arg first, verify both, then apply both mutations and call resolveOrdinaryFunctionNodeByName.

…ry.cpp` and make `dictGetOrDefault` rewrite atomic

- Added the new setting to the 26.4 settings changes history to fix the
  02995_new_settings_history test.
- Refactored the `dictGetOrDefault` path to stage all replacements before
  mutating the tree, preventing partial rewrites on early return.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

-- Test with dictGetOrDefault
SELECT 'dictGetOrDefault';
SELECT dictGetOrDefault('default.test_dict', ('country', 'city'), id, ('Unknown', 'Unknown')).1 FROM test_keys ORDER BY id;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ The dictGetOrDefault coverage here only uses existing keys, so the default branch is never exercised. That means this test does not validate the rewritten default-value path in DictGetTupleElementPass (the path that slices tuple defaults to a single element).

Please add at least one missing-key case (for example id = 999) and assert both positional and/or named access return the expected default element after optimization.

/// Now apply all mutations atomically
dict_get_args[1] = std::move(new_attr_arg);

if (is_dict_get_or_default && new_default_arg)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ This rewrite can still apply dictGetOrDefault without rewriting the default argument, which may break valid queries.

When default_arg is neither a constant tuple nor a tuple(...) function (for example, an arbitrary tuple-typed expression), new_default_arg stays null, but the code still rewrites attr_names to a single string and replaces the whole tupleElement(...) with dictGetOrDefault(...).

That produces a mismatched call shape (dictGetOrDefault(dict, 'attr', key, <tuple_default>)) and can introduce a new exception during analysis for queries that were previously valid.

Please bail out unless both pieces are rewritable for dictGetOrDefault:

  • single attribute name, and
  • matching single-element default expression (or an equivalent tupleElement(default_arg, idx) rewrite).

alexey-milovidov and others added 7 commits March 26, 2026 23:08
… be extracted

Previously, when `dictGetOrDefault` was called with a default argument
that was neither a constant tuple nor a `tuple()` function, the
optimization would still rewrite the attribute names to a single string
while leaving the original default argument untouched. This produced a
mismatched call signature that could break valid queries.

Now the pass requires both the attribute name and the default value to
be rewritable before applying any mutations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… add missing-key coverage

- Replace hardcoded `'default.test_dict'` with `currentDatabase() || '.test_dict'`
  so the test works in CI where each test runs in its own database.
- Replace full EXPLAIN QUERY TREE output matching with predicate checks
  (presence/absence of `tupleElement` in the query tree) to avoid
  database-name-dependent reference output.
- Add `dictGetOrDefault` tests with missing keys (id=999) to exercise
  the default value rewrite path, as requested in review.
- Add test for `dictGetOrDefault` with `tuple()` function as default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `DictGetTupleElementPass` is an Analyzer pass and `EXPLAIN QUERY TREE`
also requires the analyzer. Add `SET enable_analyzer = 1` at the top of
the test to fix failures in the "old analyzer" CI configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-- Test dictGetOrDefault with tuple() function as default
SELECT 'dictGetOrDefault with tuple function default';
SELECT dictGetOrDefault(currentDatabase() || '.test_dict', ('country', 'city'), toUInt64(999), tuple('FuncCountry', 'FuncCity')).1;
SELECT dictGetOrDefault(currentDatabase() || '.test_dict', ('country', 'city'), toUInt64(999), tuple('FuncCountry', 'FuncCity')).2;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ We now test the dictGetOrDefault default-path rewrite for positional access, but the pass also handles named tupleElement access (.country → string-index form). Please add one missing-key case for named access, for example:

SELECT dictGetOrDefault(currentDatabase() || '.test_dict', ('country', 'city'), toUInt64(999), ('DefaultCountry', 'DefaultCity')).country;

This ensures both default rewrite branches are covered: positional and named.

alexey-milovidov and others added 3 commits March 30, 2026 06:39
…parents

The AST fuzzer found a segfault triggered by queries like
`SELECT dictGet(...).1 FROM t ORDER BY ALL` where `ORDER BY ALL`
causes the `tupleElement` node to be shared between the SELECT list
and the ORDER BY clause. Using `std::move` to extract the inner
`dictGet` node left a null child pointer in the `tupleElement` node,
crashing the second visit. Use copy instead of move so the node
remains valid for all parents.

Also add named-access test coverage for `dictGetOrDefault` with
missing keys, as requested in review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

-- Test dictGetOrDefault with tuple() function as default
SELECT 'dictGetOrDefault with tuple function default';
SELECT dictGetOrDefault(currentDatabase() || '.test_dict', ('country', 'city'), toUInt64(999), tuple('FuncCountry', 'FuncCity')).1;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Nice coverage for constant and tuple(...) defaults, but we still miss a regression for the non-rewritable dictGetOrDefault default-value path.

DictGetTupleElementPass now correctly bails out when the default argument is not a constant tuple / tuple(...). Please add one case where the default tuple comes from an alias/expression (for example WITH ('DefaultCountry','DefaultCity') AS d ... dictGetOrDefault(..., d).1) and assert behavior stays correct.

Without this, we don't lock in the fix for the prior mismatch bug on non-rewritable defaults.

alexey-milovidov and others added 2 commits March 30, 2026 11:13
Address review feedback: add a test case where the default argument
is a non-constant expression (`materialize(...)`) to verify the pass
correctly bails out without breaking the query.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/Core/Settings.cpp
@alexey-milovidov

Copy link
Copy Markdown
Member Author

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

The MSan stress test failure (MemorySanitizer: use-of-uninitialized-value, STID 4179-5154 or 4148-3044) is a known pre-existing issue unrelated to this PR. Fix: #102158

Extends `04051_optimize_dictget_tuple_element` to cover the named
`tupleElement` access (`.country`) on top of an alias-bound default
tuple, exercising the same non-rewritable-default bail-out branch that
positional access already covers.
…y` keys

When the key argument of `dictGet` / `dictGetOrDefault` is wrapped in
`LowCardinality`, the single-attribute form preserves the `LowCardinality`
wrapper on the result, while the tuple-attribute form drops it (a tuple
cannot be `LowCardinality`). Rewriting `tupleElement(dictGet(..., tuple),
N)` into `dictGet(..., 'attr')` then changes the result type observed by
parent nodes (`String` vs `LowCardinality(String)`), tripping the
`ValidationChecker` with an exception:

    Function modulo(...) expects argument 1 to have String type but
    receives LowCardinality(String) after running DictGetTupleElement pass

Compare result types after resolving the rewritten `dictGet` and bail out
when they differ.

Found by AST fuzzer:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100186&sha=9984de9953fcafcc2739ebdb408bcbfe812438d6&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%2C%20targeted%29
@alexey-milovidov

Copy link
Copy Markdown
Member Author

The AST fuzzer (amd_tsan) failure (Logical error: AggregateFunctionArray: parameters mismatch..., STID: 4870-4f07) is a known pre-existing issue in the timeSeriesRate*ToGrid family with the Array combinator, unrelated to this PR. It's the same root cause as open issue #102993 (just a different STID variant — the new diagnostic LOGICAL_ERROR introduced by #102868 produces several closely related fingerprints for the same parameters mismatch).

The failing query is:

... timeSeriesRateToGridArrayOrNull(1048575, 1048575, '255.255.255.255', 1024)(timestamps, values) ...

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100186&sha=11706f2d4d2d84c51c80a4693e015fbc9e256f2d&name_0=PR&name_1=AST%20fuzzer%20%28amd_tsan%29

Branch updated with master.

Comment thread src/Analyzer/Passes/DictGetTupleElementPass.cpp
…etTupleElementPass`

`dictGetOrDefault` casts the entire default tuple to the dict's
attribute tuple type at execution (`castColumnAccurate` in
`FunctionDictGetNoType::executeImpl`). When an un-selected default
element has an invalid value for its attribute type (e.g. a string
default for a `UInt64` attribute), the original cast throws
`CANNOT_PARSE_TEXT`. Rewriting to a single-attribute call drops the
cast for un-selected elements and would suppress that exception.

Bail out unless the default tuple's element types exactly match the
dict's attribute types (tuple names may still differ — they do not
affect cast semantics). Add a regression test for both `(...)` and
`tuple(...)` default forms.
@alexey-milovidov

Copy link
Copy Markdown
Member Author

@groeneai, fix this pre-existing failure #100186 (comment) and send a PR.

@clickhouse-gh

clickhouse-gh Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 80.60% (108/134) · Uncovered code

Full report · Diff report

@groeneai

groeneai commented May 6, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov — confirmed STID 4870-4f07 is part of the chronic AggregateFunctionArray: parameters mismatch... family (timeseries *ToGridArray* combinator + parameter-type preservation). CIDB cross-PR data:

The canonical fix is already in flight: PR #103428 (@vitlibar) — "Resubmit: Preserve parameter types in timeseries aggregate functions" (resubmits #101083 which was reverted in #102358). It restores the invariant that IAggregateFunction::getParameters() returns exactly the parameters used to create the function, including for the timeseries family — the same root cause as STID 4870-4f07.

Status of PR #103428: OPEN, MERGEABLE since 2026-04-23, but stalled — no human review/APPROVE in the past 13 days; only Copilot has commented. The 2 "failing" CI checks are chronic flakes unrelated to the change:

  • Stress test (arm_tsan)STID 3087-4f43 (rocksdb crc32c_arm64 data race in WAL writer, not in any timeseries code path)
  • Unit tests (msan, function_prop_fuzzer) → known chronic family

So filing a parallel fix from @groeneai would just duplicate PR #103428's approach. The unblock here is reviewer attention on PR #103428 — an APPROVE there closes the chronic STID 4870 family, including the variant on this PR. Happy to file a complementary fix if you want a different angle (e.g. tighter param-type validation in the combinator dispatch path), but otherwise pushing on #103428 looks like the highest-leverage move.

@@ -0,0 +1,149 @@
-- Tags: no-fasttest
-- no-fasttest: requires dictionaries

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And so what? Can't we use dictionaries in the fast test?

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@alexey-milovidov alexey-milovidov merged commit 1d83139 into master May 6, 2026
164 of 165 checks passed
@alexey-milovidov alexey-milovidov self-assigned this May 6, 2026
@alexey-milovidov alexey-milovidov deleted the optimize-dictget-tuple-element branch May 6, 2026 23:00
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 6, 2026
@PedroTadim

Copy link
Copy Markdown
Member

Hello! Performance tests missing?

zlareb1 added a commit to zlareb1/ClickHouse that referenced this pull request May 24, 2026
Drop two randomization adds from this PR that need additional
groundwork to ship safely, and revert the seven test files that
needed `-- Tags: no-random-settings` because of them:

* `max_streams_for_union_step` and
  `max_streams_for_union_step_to_max_threads_ratio` (ClickHouse#100176) —
  reshape the `UNION ALL` pipeline, which surfaced pre-existing
  weak-ordering invariants in 8 tests (constant ORDER BY keys, NaN
  sort keys from `sin(0/0)`, `ORDER BY tuple()`, duplicate values
  without secondary sort). The dedicated `04051_max_streams_for_union_step`
  test also failed in a way suggesting randomized session-level
  values bypass per-query `SETTINGS` for these settings — a separate
  interaction worth investigating first.

* Pairing `max_bytes_ratio_before_external_join` with
  `max_bytes_before_external_join` in `randomize_external_sort_group_by`
  using the same `threshold_generator(0.3, 0.5, 0, 10 GiB)` as
  sort/group_by — caused `03279_join_choose_build_table` to fail at
  `--max_bytes_before_external_join 7693351` (grace hash join
  returned 500/125 rows instead of 1000). That smells like a real
  correctness issue in grace hash join under spilling. Restore
  upstream's standalone `max_bytes_before_external_join: random.choice([0, 1000000000])`
  (binary off / 1 GiB, never spills) and drop the ratio randomization
  for now.

What stays in this PR:

* `optimize_dictget_tuple_element` (ClickHouse#100186)
* `use_top_k_dynamic_filtering_for_variable_length_types` (ClickHouse#104216)
  with conditional_settings parent
* `page_cache_max_coalesced_bytes` (ClickHouse#104230)
* re-enable `grace_hash_join_initial_buckets` randomization with
  the narrow, previously-working `[1, 2, 4, 8]` range

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105630&sha=6f3e784ddf7&name_0=PR

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
itsjoeoui pushed a commit to itsjoeoui/ClickHouse that referenced this pull request Jun 4, 2026
QA notes on recent PRs flagged that the following settings were
introduced but not added to the randomization list in
`tests/clickhouse-test`:

* `max_bytes_ratio_before_external_join` (ClickHouse#103862) — mirrors the
  existing `max_bytes_ratio_before_external_group_by` and
  `max_bytes_ratio_before_external_sort`, so wire it through the
  Bytes-vs-Ratio choice in `randomize_external_sort_group_by`
  alongside its absolute counterpart `max_bytes_before_external_join`.
* `max_streams_for_union_step` and
  `max_streams_for_union_step_to_max_threads_ratio` (ClickHouse#100176) —
  occasionally clamp to a small number of simultaneously active
  streams to exercise the `UNION ALL` narrowing path.
* `optimize_dictget_tuple_element` (ClickHouse#100186) — default-true; disable
  with a small probability so both planner paths stay covered.
* `use_top_k_dynamic_filtering_for_variable_length_types` (ClickHouse#104216) —
  cover both the fixed-length-only default and the opt-in path.
* `page_cache_max_coalesced_bytes` (ClickHouse#104230) — randomize across
  small/large values so the coalescing path and the cap that splits
  long miss runs into multiple reads both get exercised.

Also re-enable randomization of `grace_hash_join_initial_buckets`,
which had been disabled in
`tests/clickhouse-test` after wrong results on some tests. Restore
the original conservative range `[1, 2, 4, 8]`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A peephole optimization: if someone requesting a tuple of attributes from a dictionary and immediately takes only one, don't request others.

4 participants