fix: optimize_projections failure with struct-field join keys#22903
fix: optimize_projections failure with struct-field join keys#22903kumarUjjawal wants to merge 2 commits into
Conversation
|
A subquery that survives decorrelation, such as an inequality-correlated scalar subquery over struct-field join keys, can still hit the stale-schema issue on the subquery traversal path. That plan shape cannot reach physical execution today, so I’m leaving it for a follow-up rather than expanding this PR. Any feedback would be great. |
There was a problem hiding this comment.
@kumarUjjawal
Looks good overall. I left one small performance-related suggestion, but nothing blocking.
| // traversal in the same optimizer pass. This is also | ||
| // correctness-sensitive: the in-place path refreshes parent | ||
| // schemas after child schemas change. | ||
| let has_subqueries = plan_has_subqueries(&new_plan); |
There was a problem hiding this comment.
plan_has_subqueries now runs once per rule, even after it has already returned false. Correctness still looks fine, but for the common case where there are no subqueries, this changes the old once-per-pass scan into roughly one full plan scan per optimizer rule.
Could we cache the flag for the pass and refresh it only when a rule actually changes the plan? For example: let mut has_subqueries = plan_has_subqueries(&new_plan); and then if transformed { has_subqueries = plan_has_subqueries(&new_plan); }. That should keep the behavior where subqueries removed mid-pass are noticed, while still staying safe if a rule introduces a subquery.
There was a problem hiding this comment.
Thank you! I should have caught this.
| // Recurse into children using Arc::make_mut (zero-cost when refcount == 1) | ||
| changed |= map_children_mut(plan, |child| { | ||
| rewrite_plan_in_place(child, apply_order, rule, config) | ||
| let mut child_schema_changed = false; |
There was a problem hiding this comment.
This seems like it is potentially treating a symptom rather than the root cause 🤔
I worry this change may be ignoring the root cause elsewhere (an incorrectly reported "transofrmed" flag)
It seems like if the plan returns transformed = false the schema should not have changed.
There was a problem hiding this comment.
I worry this change may be ignoring the root cause elsewhere (an incorrectly reported "transofrmed" flag)
I think validating Transformed accuracy is worthwhile, but probably separate from this fix.
This PR intentionally continues to trust transformed; it recomputes parent schemas when a child rewrite reports changed = true and the child schema actually differs.
|
run benchmark sql_planner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing fix/no-field-name-join (144d0e9) to d77a02d (merge-base) diff using: sql_planner File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagesql_planner — base (merge-base)
sql_planner — branch
File an issue against this benchmark runner |
Which issue does this PR close?
optimize_projectionsfails with "No field named ..." when join keys containget_field(ExtractLeafExpressions) #22895.Rationale for this change
Join-key extraction can add helper projections and change a child plan's output schema. Some parent nodes cache their schema, so after the child rewrite they can still expose stale fields. Later projection pruning then uses the stale schema and fails with missing-column errors.
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No