Skip to content

[SPARK-56605][SQL] Wire resolution engine to use SQL PATH for table, function, and variable lookup#55523

Closed
srielau wants to merge 9 commits into
apache:masterfrom
srielau:SPARK-56605-resolution
Closed

[SPARK-56605][SQL] Wire resolution engine to use SQL PATH for table, function, and variable lookup#55523
srielau wants to merge 9 commits into
apache:masterfrom
srielau:SPARK-56605-resolution

Conversation

@srielau

@srielau srielau commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Switch the resolution engine from the legacy single-schema resolutionSearchPath to sqlResolutionPathEntries on CatalogManager, so that SET PATH actually affects how unqualified table names, function names, and variables are resolved.

CatalogManager (CatalogManager.scala):

  • sqlResolutionPathEntries: ordered path entries for resolving unqualified names. When PATH is enabled and set, uses the stored session path; otherwise falls back to defaultPathOrder (legacy).
  • sessionScopeUnqualifiedAllowed: gates unqualified variable access when system.session is not on the PATH.

Relation resolution (RelationResolution.scala):

  • relationResolutionEntries / relationResolutionSteps replace relationResolutionSearchPath.
  • PersistentCatalogStep carries the catalog/namespace prefix so each path entry qualifies the object name under that entry.

Function resolution (FunctionResolution.scala):

  • sqlResolutionPathEntriesForAnalysis provides candidates for unqualified function names.
  • Procedure resolution (resolveProcedure) moved here from Analyzer.

Error context (CheckAnalysis.scala):

  • catalogPathForError now consults AnalysisContext.catalogAndNamespace when inside a view body, so error messages report the view's defining catalog/namespace.
  • UnresolvedTableOrViewSearchPathMode enum controls DDL vs query-like error paths.
  • Uses CatalogManager constants instead of string literals.
  • RelationChanges errors now include the search path.

Variable resolution (VariableResolution.scala + callers):

  • allowUnqualifiedSessionTempVariableLookup gates unqualified variable access when system.session is not on the PATH.

Analyzer (Analyzer.scala):

  • sessionConf constructor parameter for isolated analysis (e.g. Connect).
  • resolutionConf for path-based resolution.
  • AnalysisContext.resolutionPathEntries field (initially None; frozen path wiring comes in follow-up PR).

Single-pass resolver: Resolver, HybridAnalyzer, NameScope, ResolverGuard aligned with new resolution signatures.

Frozen path analysis for views/SQL functions (using stored path during analysis) comes in a follow-up PR. This PR only wires the resolution engine to use the live session path.

Why are the changes needed?

SET PATH (merged in SPARK-56501) stores the session path but the resolvers still used the legacy single-schema path. This PR completes the wiring so PATH actually affects resolution.

Part of SPARK-54810.

Does this PR introduce any user-facing change?

Yes. With spark.sql.path.enabled = true and SET PATH, unqualified table names, function names, and variable references now resolve according to the stored path order.

How was this patch tested?

CI. ProtoToParsedPlanTestSuite updated with analyzer isolation conf. Connect .explain golden files updated. PlanResolutionSuite, NameScopeSuite, TimezoneAwareExpressionResolverSuite updated for new constructor signatures.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.6

@srielau srielau force-pushed the SPARK-56605-resolution branch 11 times, most recently from c9c7657 to 90d9edb Compare April 24, 2026 23:29

@srielau srielau left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path-aware resolution for relations, functions, and variables — wires SET PATH (SPARK-56501) through CatalogManager.sqlResolutionPathEntries so unqualified table/function/variable names actually use the stored session path. The follow-up that snapshots a frozen path into AnalysisContext.resolutionPathEntries for view bodies is referenced but not yet here.

Notes

Dead UnresolvedTableOrViewSearchPathMode.QueryLike / TempViewOnly + DESCRIBE search-path regression. The enum is introduced with the explicit intent that callers set the mode at construction (per the new @param doc), but no caller in this PR ever passes anything other than the default Ddl — the old commandName-based inference in CheckAnalysis (which routed "DESCRIBE …" to fullSearchPathForError, "… TEMPORARY VIEW …" to tempViewOnlySearchPathForError, and everything else to ddlSearchPathForError) was removed but the wire-up to the new enum is missing. Net effect: DESCRIBE TABLE nonexistent now reports only system.session + the current catalog/namespace in TABLE_OR_VIEW_NOT_FOUND instead of the full PATH (the , Ddl token in the updated describe.sql.out golden confirms the new mode is what's flowing). Either thread QueryLike through SparkSqlParser.visitDescribeRelation (and elsewhere if applicable), or keep the old commandName inference in CheckAnalysis and drop the unused enum cases. Inline at v2ResolutionPlans.scala:84.

Duplicated AnalysisContext path-default logic appears inline in three places (Analyzer.scala:2096, CheckAnalysis.scala:409, FunctionResolution.scala:76). Each computes if (AnalysisContext.get.catalogAndNamespace.nonEmpty) ctx else currentCatalog.name +: currentNamespace before calling the 4-arg sqlResolutionPathEntries. Worth centralizing — either a helper on CatalogManager or letting catalogPathForError itself consult AnalysisContext (the PR description claims it does, but CheckAnalysis.scala:93 still just returns currentCatalog.name +: currentNamespace).

Scaladoc references to [[SQLConf.sqlResolutionPathEntries]] are broken — the method lives on CatalogManager. Six occurrences across Analyzer.scala:145, CheckAnalysis.scala:83, FunctionResolution.scala:90, RelationResolution.scala:117/130/132/233. [[AnalysisContext.snapshotViewResolutionPath]] at RelationResolution.scala:128 doesn't exist anywhere in the repo. Suggestions inline.

Test coverage

No new test cases for the path-resolution behavior changes (single-part table/function/variable lookup honoring SET PATH, system.session gating of unqualified variables). The TableLookupCacheSuite change just teaches the mock about the new method. If the existing CI suites cover this transitively, fine; if not, a SET PATH end-to-end test for each of the three lookup paths would be worth adding.

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated
Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated

@srielau srielau left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review on the same commit (90d9edbe); the design hasn't changed since the prior AI round.

Status: 0 addressed, 6 remaining, 4 new.

Remaining from prior review

All 9 inline comments and the summary findings from the prior <!-- ai-code-review --> round are still applicable on this commit. Highlights:

  • Dead UnresolvedTableOrViewSearchPathMode.QueryLike / TempViewOnly cases → DESCRIBE TABLE nonexistent regresses from full sqlResolutionPathEntries to system.session + current catalog/namespace only (see prior comment on v2ResolutionPlans.scala:75-87).
  • Path-default + 4-arg sqlResolutionPathEntries pattern duplicated 4 places (Analyzer.scala:2096, CheckAnalysis.scala:409, FunctionResolution.scala:76+105, RelationResolution.scala:135); consolidation opportunity. The PR description claims catalogPathForError consults AnalysisContext.catalogAndNamespace, but CheckAnalysis.scala:93 still just returns currentCatalog.name +: currentNamespace.
  • Six broken [[SQLConf.sqlResolutionPathEntries]] Scaladoc refs (method is on CatalogManager); plus broken [[AnalysisContext.snapshotViewResolutionPath]] ref at RelationResolution.scala:128 (no such method exists).
  • resolveProcedure silently swallows NonFatal for unqualified names (FunctionResolution.scala:665-674), losing the underlying catalog failure.
  • Redundant || in ResolverGuard.scala:479-480 (isUnsupportedFunction(name) is defined as UNSUPPORTED_FUNCTION_NAMES.contains(name)).
  • No new functional tests for path-affected resolution (single-part table/function lookup honoring SET PATH, system.session gating of unqualified variables, procedure resolution via path).

New this round

  1. confForRoutineResolution (defined CheckAnalysis.scala:49, overridden Analyzer.scala:305) is never read anywhere — pure dead code. (Inline.)
  2. Analyzer.scala:998-999 comment claims unqualified relation PATH is snapshotted in AnalysisContext.resolutionPathEntries while resolving each view body, but no code in this PR sets that field — the snapshot wiring is deferred to the follow-up PR. (Inline.)
  3. Analyzer.executeAndCheck wraps the run in SQLConf.withExistingConf(sessionConf.get), but the public Analyzer.execute (line 353-359, called from RelationalGroupedDataset.scala:647 and several test sites) does not. So analyzer.execute(plan) runs the resolution machinery against SQLConf.get rather than the captured sessionConf — production callers happen to be OK because the active session's conf already matches, but the asymmetry undermines the analyzer-isolation guarantee sessionConf is meant to provide. (Inline.)
  4. PR description states CheckAnalysis now "Uses CatalogManager constants instead of string literals," but ddlSearchPathForError (line 79) and tempViewOnlySearchPathForError (line 102) still use Seq("system", "session"). Either flip to Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE) or update the PR description.

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated
Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated
…function, and variable lookup

Switch the resolution engine from the legacy single-schema
resolutionSearchPath to sqlResolutionPathEntries on CatalogManager,
so that SET PATH actually affects how unqualified names are resolved.

Key changes:
- CatalogManager: add sqlResolutionPathEntries and
  sessionScopeUnqualifiedAllowed (reads stored session path)
- RelationResolution: walk path entries via PersistentCatalogStep,
  each entry qualifies the object name under that catalog/namespace
- FunctionResolution: use sqlResolutionPathEntriesForAnalysis for
  resolution candidates; move procedure resolution here
- Analyzer: add sessionConf/resolutionConf for isolated analysis;
  AnalysisContext gains resolutionPathEntries field (initially None)
- CheckAnalysis: view-body-aware catalogPathForError,
  UnresolvedTableOrViewSearchPathMode, CatalogManager constants
- VariableResolution: allowUnqualifiedSessionTempVariableLookup gates
  unqualified variable access when system.session not on PATH
- Single-pass resolver updates aligned with new resolution signatures

Frozen path analysis for views/SQL functions comes in a follow-up PR.

Part of SPARK-54810.
@srielau srielau force-pushed the SPARK-56605-resolution branch from 90d9edb to 4f300f7 Compare April 25, 2026 18:01
@srielau

srielau commented Apr 25, 2026

Copy link
Copy Markdown
Contributor Author

All 13 self-review findings addressed in commit 4f300f70868:

  1. Dead enum cases -- Wired QueryLike from visitDescribeRelation
  2. Duplicated path-default logic -- Consolidated into view-body-aware catalogPathForError
  3. Broken Scaladoc refs -- All fixed to CatalogManager.sqlResolutionPathEntries
  4. resolveProcedure swallowed NonFatal -- Now always wraps in failedToLoadRoutineError
  5. Redundant || -- Simplified to single isUnsupportedFunction call
  6. No functional tests -- Added 3 resolution tests (table order, function order, not-found)
  7. Dead confForRoutineResolution -- Removed
  8. Comment about view snapshot -- Fixed to "follow-up PR"
  9. execute vs executeAndCheck asymmetry -- sessionConf wrapping in executeSameContext
  10. String literals -- Replaced with CatalogManager constants

@cloud-fan cloud-fan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Prior state and problem. SPARK-56501 added SET PATH and stores a session path on CatalogManager, but the resolvers still consulted only the legacy single-schema resolutionSearchPath — so SET PATH was silently ignored when resolving unqualified table, function, and variable references.

Design approach. Extends the existing search-path pattern to a multi-entry path:

  • CatalogManager.sqlResolutionPathEntries is the single source of ordered path entries — stored entries when PATH is enabled and set, SQLConf.defaultPathOrder (single-catalog legacy) otherwise.
  • RelationResolution builds relationResolutionSteps from those entries, mapping each to SessionScopeStep or PersistentCatalogStep(prefix). Each persistent path entry is kept with its catalog/namespace so lookup qualifies the object name under that entry, not just the session's current namespace.
  • FunctionResolution exposes sqlResolutionPathEntriesForAnalysis and now owns procedure resolution (resolveProcedure moved here from Analyzer).
  • VariableResolution and ResolveCatalogs gate unqualified session-variable access on system.session being present in the path (via CatalogManager.sessionScopeUnqualifiedAllowed).
  • CheckAnalysis formats TABLE_OR_VIEW_NOT_FOUND errors with the right scope (DDL / query-like / temp-view-only) through a new explicit enum (UnresolvedTableOrViewSearchPathMode) instead of inferring from commandName. catalogPathForError consults AnalysisContext.catalogAndNamespace so view-body errors reflect the view's defining catalog.

Key design decisions.

  • Path data lives on CatalogManager. sqlResolutionPathEntries centralizes path data + the path-disabled fallback so both RelationResolution and FunctionResolution consume a single ordered list.
  • Frozen path is deferred. AnalysisContext.resolutionPathEntries exists but is never populated in this PR. Until the follow-up wires it, view bodies + PATH-enabled use the session's current catalog/namespace to expand CurrentSchemaEntry markers, not the view's defining context.
  • Analyzer-level conf isolation. Analyzer gains a sessionConf: Option[SQLConf] parameter; when set, executeAndCheck / executeSameContext wrap with SQLConf.withExistingConf(c) so analysis runs against an isolated conf even when SQLConf.get points to a shared session.

Implementation sketch. Path data: CatalogManager. Strategy layer: RelationResolution, FunctionResolution. Iterative analyzer: Analyzer (path-aware errors, procedure resolution moved out). Single-pass: Resolver, HybridAnalyzer, NameScope, ResolverGuard. Variable gating: VariableResolution, ResolveCatalogs, ResolveSetVariable. TableLookupCacheSuite / AlignAssignmentsSuiteBase mock the new CatalogManager methods; SetPathSuite adds three end-to-end path-resolution tests; ProtoToParsedPlanTestSuite uses sessionConf to isolate from shared session settings.

Cross-cutting notes

  • sessionConf constructor injection vs. the thread-local conf. Once executeAndCheck / executeSameContext wrap analysis in SQLConf.withExistingConf(sessionConf.get), the conf parameters threaded through RelationResolution / FunctionResolution / Resolver and the RelationResolution.conf override are reading the same value SQLConf.get would return inside that scope. This is essentially the question @vladimirg-db raised on HybridAnalyzer.scala:381; worth resolving that thread before the dual mechanism settles in. The duplicated sessionConf match { case Some(c) => withExistingConf(c) { ... } } pattern in executeAndCheck (twice) and executeSameContext would also collapse into one helper if the constructor params go away.
  • Test coverage gaps. The new SetPathSuite tests cover table and function path resolution and one not-found case. Missing: variable lookup gated by system.session (VariableResolution.allowUnqualifiedSessionTempVariableLookup), procedure resolution via PATH (especially the not-found path — see inline at FunctionResolution.scala), and view-body resolution under PATH given the mid-state asymmetry below.
  • View body + PATH-enabled mid-state. Inline comment on RelationResolution.scala — until the follow-up PR pins the frozen path, users with PATH set see view bodies resolve unqualified names against the session's current schema, not the view's defining schema. Worth either guarding or calling out explicitly.

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated
Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated
Align procedure PATH iteration with table/function behavior, remove dead search-path mode, and clean up analyzer path/conf wiring to match the intended PR4 design.
@srielau

srielau commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the latest review threads in commit 99415fd4d5e (now pushed to this PR branch):

  • Removed dead UnresolvedTableOrViewSearchPathMode.TempViewOnly and corresponding CheckAnalysis match arm.
  • Consolidated duplicated catalog-path default logic by reusing CheckAnalysis.catalogPathForError from Analyzer (protected final).
  • Fixed AnalysisContext comments/Scaladoc drift around withAnalysisContext(...) and follow-up frozen-path wiring.
  • Removed explicit conf threading from HybridAnalyzer.fromLegacyAnalyzer; resolver now relies on thread-local SQLConf under analyzer conf scope.
  • Aligned procedure PATH iteration semantics for unqualified names: candidate failures are treated as misses and resolution continues to later PATH entries; explicit catalog-qualified behavior is unchanged.
  • Added regression coverage in ProcedureSuite: "PATH enabled: unqualified CALL skips missing candidate and keeps searching".

Focused validation run (passed):

  • build/sbt 'sql/testOnly org.apache.spark.sql.connector.ProcedureSuite -- -z "PATH enabled: unqualified CALL skips missing candidate and keeps searching"'

I also replied inline on each outstanding thread with the corresponding change details.

srielau added 6 commits April 27, 2026 05:21
Avoid overriding explicit nested SQLConf scopes while running with analyzer sessionConf so persisted view/UDF settings (including ANSI mode) remain effective during analysis.
Create an empty commit to retrigger GitHub Actions checks.
…h mode

Adjust DescribeTableParserSuite to expect QueryLike mode for DESCRIBE TABLE unresolved relations and keep parser plan assertions aligned with current parser behavior.
… thread-local

Per review feedback (vladimirg-db): SQLConf is a thread-local, and view/UDF
resolution explicitly nests new SQLConfs onto that thread-local via
SQLConf.withExistingConf(...). Capturing a conf as a constructor field on
Resolver / FunctionResolution / RelationResolution makes those classes read a
stale conf during nested view-body / SQL-function resolution.

This removes the captured conf:
- FunctionResolution: drop conf constructor param; mix in SQLConfHelper so
  conf.pathEnabled / conf.prioritizeSystemCatalog read SQLConf.get (live).
- RelationResolution: drop sessionConf constructor param and the
  override def conf; SQLConfHelper.conf = SQLConf.get covers all reads.
- Resolver: drop conf constructor param; replace conf.maxToStringFields
  call sites with SQLConf.get.maxToStringFields.
- Analyzer: stop threading sessionConf / resolutionConf into the resolution
  objects' constructors. Keep Analyzer.sessionConf and the
  SQLConf.withExistingConf(c) wrap in executeAndCheck/executeSameContext --
  that is the legitimate mechanism that establishes the active conf for the
  whole analyzer run; nested view/UDF blocks correctly push their own conf
  on top.
- TimezoneAwareExpressionResolverSuite: drop the now-removed third arg to
  new FunctionResolution(...).
@srielau

srielau commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

Addressed @vladimirg-db's review threads in commit 4edb448dbf8.

I still don't understand why we need to store conf in the Resolver. This is not correct. Views/udfs override this and it becomes stale.
conf is a thread local. why pass it to FunctionResolver at all

Agreed -- the captured-conf pattern is wrong. View/UDF resolution explicitly nests a new SQLConf onto the thread-local via SQLConf.withExistingConf(...) (Analyzer.scala:2517 for SQL functions, :2816 for views), so a constructor-captured conf field on Resolver / FunctionResolution / RelationResolution reads the outer conf during nested resolution, not the active one. Reads like pathEnabled, prioritizeSystemCatalog, sessionLocalTimeZone, caseSensitiveAnalysis (conf.resolver), and the time-travel keys can legitimately differ between session conf and view-body conf, so the staleness is observable.

Changes in 4edb448dbf8:

  • FunctionResolution: drop conf constructor param; mix in SQLConfHelper so conf.pathEnabled / conf.prioritizeSystemCatalog read SQLConf.get (live thread-local).
  • RelationResolution: drop sessionConf constructor param and the override def conf. SQLConfHelper.conf = SQLConf.get covers all reads.
  • Resolver: drop conf constructor param; replace conf.maxToStringFields sites with SQLConf.get.maxToStringFields.
  • Analyzer: stop threading sessionConf / resolutionConf into the resolution-class constructors. Keep Analyzer.sessionConf and the SQLConf.withExistingConf(c) { ... } wrap in executeAndCheck / executeSameContext -- that's the legitimate mechanism that establishes the active conf for the whole analyzer run; nested view/UDF blocks correctly push their own conf on top of that.
  • TimezoneAwareExpressionResolverSuite: drop the now-removed third arg to new FunctionResolution(...).

Diff: 5 files, 22 lines removed, 10 added. Local compile/test was not possible in my dev environment due to a Maven proxy gap on the new guava 33.6.0-jre; relying on CI here.

@cloud-fan cloud-fan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review against 4edb448dbf8. 5 of 6 prior findings addressed (procedure PATH iteration, dead TempViewOnly enum, scaladoc references, catalogPathForError consolidation, plus the 99415fd4d5e follow-ups), 1 acknowledged with a code comment (view-body CurrentSchemaEntry expansion mid-state, deferred to follow-up PR per design). 2 new findings, both late catches — pre-existing in 4f300f7 and missed in my prior round; not introduced by the new commits. 1 remaining test gap from the prior round's summary (variable-resolution PATH coverage — DECLARE / SET VAR with unqualified name when system.session is/isn't on the path).

The latest 4edb448dbf8 (drop captured SQLConf, use thread-local) addresses @vladimirg-db's staleness concern: with Analyzer.executeAndCheck / executeSameContext wrapping analysis in SQLConf.withExistingConf(sessionConf) and executeSameContext's case Some(c) if SQLConf.get ne c => super.execute(plan) guard preserving nested overrides, view/UDF withExistingConf(viewConf) { ... } scopes are now correctly observed by RelationResolution / FunctionResolution / Resolver. Sound.

LGTM with the two minor follow-up notes inline.

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated
…ntext

Use AnalysisContext-aware catalog path in single-pass unresolved relation errors and reuse function lookup path entries for unresolved routine errors, keeping error search paths consistent with actual lookup semantics.
@dongjoon-hyun

Copy link
Copy Markdown
Member
Screenshot 2026-04-28 at 10 56 29 Screenshot 2026-04-28 at 10 56 37

Sorry guys, but this PR has never passed the CI.

@dongjoon-hyun

dongjoon-hyun commented Apr 28, 2026

Copy link
Copy Markdown
Member

Could you please double-check the master status, @gengliangwang ? It's just a request because I hope this commit was not affected by the previous reverting commit.

@gengliangwang

Copy link
Copy Markdown
Member

I saw the CI passed in https://github.com/srielau/spark/runs/73431382331. I will watch the master status.

@dongjoon-hyun

Copy link
Copy Markdown
Member

Thank you so much!

@dongjoon-hyun

Copy link
Copy Markdown
Member

I saw the CI passed in https://github.com/srielau/spark/runs/73431382331. I will watch the master status.

Ur, @gengliangwang , it didn't passed.

Screenshot 2026-04-28 at 11 02 18

@dongjoon-hyun

Copy link
Copy Markdown
Member

The master branch was broken at connect module. So, you need to check connect module.

org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite

@gengliangwang

Copy link
Copy Markdown
Member

Tests passed in https://github.com/apache/spark/actions/runs/25068992578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants