BE-513: HashQL: Rework dynamic aggregate size estimation#8697
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
PR SummaryMedium Risk Overview Structs and tuples now treat the value as a single composite: cardinality 1, units = sum of each field’s materialized information ( Adds Reviewed by Cursor Bugbot for commit c5bf4ee. Bugbot is set up for automated code reviews on this repo. Configure here. |
d52594c to
ae2d3af
Compare
fb7f8ba to
5eeec53
Compare
5203abe to
026ce7b
Compare
026ce7b to
b7cf577
Compare
839269e to
b5ae5ee
Compare
b7cf577 to
db1e44d
Compare
b5ae5ee to
38f9579
Compare
38f9579 to
d7036bb
Compare
db1e44d to
0a4f75a
Compare
d7036bb to
7cf9331
Compare
40c3c68 to
f9a8499
Compare
7cf9331 to
36b9aa2
Compare
36b9aa2 to
f01fad3
Compare
f9a8499 to
c883c9e
Compare
c883c9e to
7323f1e
Compare
f01fad3 to
5151eb4
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates HashQL MIR’s dynamic size-estimation analysis to be aggregate-kind aware, fixing the previously incorrect behavior where all aggregates were treated like collections (cardinality accumulating with operand count). It also introduces a Footprint::materialize() helper to collapse (units, cardinality) into a single “total information” estimate when embedding nested values inside composite aggregates.
Changes:
- Reworks dynamic aggregate evaluation to dispatch by
AggregateKind(struct/tuple vs list vs dict vs closure vs opaque) with correct cardinality semantics. - Adds
Footprint::materialize(),Footprint::one(...),Estimate::saturating_coeff_mul, andInformationRange::saturating_mul_cardinalityto support nested/typed aggregate estimation. - Expands and updates tests + snapshots to validate corrected cardinalities and per-element/per-pair unit semantics.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| libs/@local/hashql/mir/src/pass/analysis/size_estimation/dynamic.rs | Implements kind-specific aggregate footprint evaluation (struct/tuple scalarize, list/dict collection semantics, closure scalarize, opaque legacy behavior). |
| libs/@local/hashql/mir/src/pass/analysis/size_estimation/footprint.rs | Adds Footprint::one and Footprint::materialize() plus unit tests for the new materialization behavior. |
| libs/@local/hashql/mir/src/pass/analysis/size_estimation/estimate.rs | Adds Estimate::saturating_coeff_mul to support coefficient-wise multiplication used by materialize(). |
| libs/@local/hashql/mir/src/pass/analysis/size_estimation/range.rs | Adds InformationRange::saturating_mul_cardinality and unit tests for range-level multiplication behavior. |
| libs/@local/hashql/mir/src/pass/analysis/size_estimation/tests.rs | Adds integration tests covering list/dict units semantics and tuple/struct cardinality fixes. |
| libs/@local/hashql/mir/tests/ui/pass/size-estimation/*.snap | Adds new snapshots and updates existing ones to reflect corrected cardinalities and new test coverage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Benchmark results
|
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2002 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1001 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 3314 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 1526 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 2078 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 1033 | Flame Graph |
policy_resolution_medium
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 102 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 51 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 269 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 107 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 133 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 63 | Flame Graph |
policy_resolution_none
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 8 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 3 | Flame Graph |
policy_resolution_small
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 52 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 25 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 94 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 26 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 66 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 29 | Flame Graph |
read_scaling_complete
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id;one_depth | 1 entities | Flame Graph | |
| entity_by_id;one_depth | 10 entities | Flame Graph | |
| entity_by_id;one_depth | 25 entities | Flame Graph | |
| entity_by_id;one_depth | 5 entities | Flame Graph | |
| entity_by_id;one_depth | 50 entities | Flame Graph | |
| entity_by_id;two_depth | 1 entities | Flame Graph | |
| entity_by_id;two_depth | 10 entities | Flame Graph | |
| entity_by_id;two_depth | 25 entities | Flame Graph | |
| entity_by_id;two_depth | 5 entities | Flame Graph | |
| entity_by_id;two_depth | 50 entities | Flame Graph | |
| entity_by_id;zero_depth | 1 entities | Flame Graph | |
| entity_by_id;zero_depth | 10 entities | Flame Graph | |
| entity_by_id;zero_depth | 25 entities | Flame Graph | |
| entity_by_id;zero_depth | 5 entities | Flame Graph | |
| entity_by_id;zero_depth | 50 entities | Flame Graph |
read_scaling_linkless
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 1 entities | Flame Graph | |
| entity_by_id | 10 entities | Flame Graph | |
| entity_by_id | 100 entities | Flame Graph | |
| entity_by_id | 1000 entities | Flame Graph | |
| entity_by_id | 10000 entities | Flame Graph |
representative_read_entity
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1
|
Flame Graph |
representative_read_entity_type
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| get_entity_type_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba
|
Flame Graph |
representative_read_multiple_entities
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_property | traversal_paths=0 | 0 | |
| entity_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=0 | 0 | |
| link_by_source_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true |
scenarios
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| full_test | query-limited | Flame Graph | |
| full_test | query-unlimited | Flame Graph | |
| linked_queries | query-limited | Flame Graph | |
| linked_queries | query-unlimited | Flame Graph |

🌟 What is the purpose of this PR?
The size estimation analysis previously treated all aggregate kinds (structs, tuples, lists, dicts, closures) identically — summing operand footprints and accumulating cardinality as if every aggregate were a flat collection. This was incorrect: a struct or tuple is a single composite value with cardinality 1, while a list or dict is a true collection whose cardinality equals its element count.
This PR introduces type-aware aggregate footprint evaluation. Structs and tuples now correctly report cardinality 1 with units equal to the sum of their fields' materialized sizes. Lists report per-element units (joined across elements) with cardinality equal to the element count. Dicts compute per-pair units (key + value combined) with cardinality equal to the pair count. Closures combine their function pointer and environment footprints into a single scalar value.
To support this, a
materialize()method is introduced onFootprintthat collapses a footprint's(units, cardinality)pair into a single total information estimate. This is needed when a value with its own cardinality (e.g. a list) is embedded as a field of a composite type — the field's contribution to the parent's units must account for the full information content of the nested value, not just its per-element size.🔍 What does this change?
RValue::Aggregatehandler ineval_rvaluewith a dedicatedeval_rvalue_aggregatemethod that dispatches onAggregateKind.materialize()d footprints of all operands and sets cardinality to 1.saturating_mul_add) and sets cardinality to the pair count.Footprint::materialize()which multiplies units by cardinality to produce a total information estimate, with case-specific handling for constant×constant (exact), affine units×constant cardinality (scale coefficients by cardinality upper bound), and affine×affine (element-wise coefficient multiplication as a linear under-approximation).Footprint::one(units)constructor for footprints with cardinality exactly 1.Estimate::saturating_coeff_mulfor element-wise coefficient multiplication between two estimates.InformationRange::saturating_mul_cardinalityfor range-level multiplication of information by cardinality, saturating to unbounded on overflow.Eval::into_footprintas a consuming counterpart toEval::as_ref.struct_aggregate_sums_operandsandtuple_aggregate_sums_operands, which previously reported cardinality 2 for a two-field struct/tuple; both now correctly report cardinality 1.🛡 What tests cover this?
list_aggregate_per_element_units,dict_aggregate_per_pair_units,tuple_many_fields_cardinality_one, andstruct_materializes_list_parameter, each with corresponding snapshot files.footprint.rscovering all fourmaterialize()branches: scalar identity, constant×constant, affine units×constant cardinality, constant units×affine cardinality, and both-affine same-parameter.range.rscoveringsaturating_mul_cardinality: exact multiplication, identity by 1, empty inputs, unbounded cardinality, and overflow to unbounded.❓ How to test this?
cargo test -p hashql-mirand confirm all tests pass.libs/@local/hashql/mir/tests/ui/pass/size-estimation/to verify the reportedunitsandcardinalityvalues match the expected semantics for each aggregate kind.