[SPARK-56605][SQL] Wire resolution engine to use SQL PATH for table, function, and variable lookup#55523
[SPARK-56605][SQL] Wire resolution engine to use SQL PATH for table, function, and variable lookup#55523srielau wants to merge 9 commits into
Conversation
c9c7657 to
90d9edb
Compare
srielau
left a comment
There was a problem hiding this comment.
Path-aware resolution for relations, functions, and variables — wires SET PATH (SPARK-56501) through CatalogManager.sqlResolutionPathEntries so unqualified table/function/variable names actually use the stored session path. The follow-up that snapshots a frozen path into AnalysisContext.resolutionPathEntries for view bodies is referenced but not yet here.
Notes
Dead UnresolvedTableOrViewSearchPathMode.QueryLike / TempViewOnly + DESCRIBE search-path regression. The enum is introduced with the explicit intent that callers set the mode at construction (per the new @param doc), but no caller in this PR ever passes anything other than the default Ddl — the old commandName-based inference in CheckAnalysis (which routed "DESCRIBE …" to fullSearchPathForError, "… TEMPORARY VIEW …" to tempViewOnlySearchPathForError, and everything else to ddlSearchPathForError) was removed but the wire-up to the new enum is missing. Net effect: DESCRIBE TABLE nonexistent now reports only system.session + the current catalog/namespace in TABLE_OR_VIEW_NOT_FOUND instead of the full PATH (the , Ddl token in the updated describe.sql.out golden confirms the new mode is what's flowing). Either thread QueryLike through SparkSqlParser.visitDescribeRelation (and elsewhere if applicable), or keep the old commandName inference in CheckAnalysis and drop the unused enum cases. Inline at v2ResolutionPlans.scala:84.
Duplicated AnalysisContext path-default logic appears inline in three places (Analyzer.scala:2096, CheckAnalysis.scala:409, FunctionResolution.scala:76). Each computes if (AnalysisContext.get.catalogAndNamespace.nonEmpty) ctx else currentCatalog.name +: currentNamespace before calling the 4-arg sqlResolutionPathEntries. Worth centralizing — either a helper on CatalogManager or letting catalogPathForError itself consult AnalysisContext (the PR description claims it does, but CheckAnalysis.scala:93 still just returns currentCatalog.name +: currentNamespace).
Scaladoc references to [[SQLConf.sqlResolutionPathEntries]] are broken — the method lives on CatalogManager. Six occurrences across Analyzer.scala:145, CheckAnalysis.scala:83, FunctionResolution.scala:90, RelationResolution.scala:117/130/132/233. [[AnalysisContext.snapshotViewResolutionPath]] at RelationResolution.scala:128 doesn't exist anywhere in the repo. Suggestions inline.
Test coverage
No new test cases for the path-resolution behavior changes (single-part table/function/variable lookup honoring SET PATH, system.session gating of unqualified variables). The TableLookupCacheSuite change just teaches the mock about the new method. If the existing CI suites cover this transitively, fine; if not, a SET PATH end-to-end test for each of the three lookup paths would be worth adding.
srielau
left a comment
There was a problem hiding this comment.
Re-review on the same commit (90d9edbe); the design hasn't changed since the prior AI round.
Status: 0 addressed, 6 remaining, 4 new.
Remaining from prior review
All 9 inline comments and the summary findings from the prior <!-- ai-code-review --> round are still applicable on this commit. Highlights:
- Dead
UnresolvedTableOrViewSearchPathMode.QueryLike/TempViewOnlycases →DESCRIBE TABLE nonexistentregresses from fullsqlResolutionPathEntriestosystem.session + current catalog/namespaceonly (see prior comment onv2ResolutionPlans.scala:75-87). - Path-default + 4-arg
sqlResolutionPathEntriespattern duplicated 4 places (Analyzer.scala:2096,CheckAnalysis.scala:409,FunctionResolution.scala:76+105,RelationResolution.scala:135); consolidation opportunity. The PR description claimscatalogPathForErrorconsultsAnalysisContext.catalogAndNamespace, butCheckAnalysis.scala:93still just returnscurrentCatalog.name +: currentNamespace. - Six broken
[[SQLConf.sqlResolutionPathEntries]]Scaladoc refs (method is onCatalogManager); plus broken[[AnalysisContext.snapshotViewResolutionPath]]ref atRelationResolution.scala:128(no such method exists). resolveProceduresilently swallowsNonFatalfor unqualified names (FunctionResolution.scala:665-674), losing the underlying catalog failure.- Redundant
||inResolverGuard.scala:479-480(isUnsupportedFunction(name)is defined asUNSUPPORTED_FUNCTION_NAMES.contains(name)). - No new functional tests for path-affected resolution (single-part table/function lookup honoring
SET PATH,system.sessiongating of unqualified variables, procedure resolution via path).
New this round
confForRoutineResolution(definedCheckAnalysis.scala:49, overriddenAnalyzer.scala:305) is never read anywhere — pure dead code. (Inline.)Analyzer.scala:998-999comment claims unqualified relation PATH is snapshotted inAnalysisContext.resolutionPathEntrieswhile resolving each view body, but no code in this PR sets that field — the snapshot wiring is deferred to the follow-up PR. (Inline.)Analyzer.executeAndCheckwraps the run inSQLConf.withExistingConf(sessionConf.get), but the publicAnalyzer.execute(line 353-359, called fromRelationalGroupedDataset.scala:647and several test sites) does not. Soanalyzer.execute(plan)runs the resolution machinery againstSQLConf.getrather than the capturedsessionConf— production callers happen to be OK because the active session's conf already matches, but the asymmetry undermines the analyzer-isolation guaranteesessionConfis meant to provide. (Inline.)- PR description states
CheckAnalysisnow "UsesCatalogManagerconstants instead of string literals," butddlSearchPathForError(line 79) andtempViewOnlySearchPathForError(line 102) still useSeq("system", "session"). Either flip toSeq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE)or update the PR description.
…function, and variable lookup Switch the resolution engine from the legacy single-schema resolutionSearchPath to sqlResolutionPathEntries on CatalogManager, so that SET PATH actually affects how unqualified names are resolved. Key changes: - CatalogManager: add sqlResolutionPathEntries and sessionScopeUnqualifiedAllowed (reads stored session path) - RelationResolution: walk path entries via PersistentCatalogStep, each entry qualifies the object name under that catalog/namespace - FunctionResolution: use sqlResolutionPathEntriesForAnalysis for resolution candidates; move procedure resolution here - Analyzer: add sessionConf/resolutionConf for isolated analysis; AnalysisContext gains resolutionPathEntries field (initially None) - CheckAnalysis: view-body-aware catalogPathForError, UnresolvedTableOrViewSearchPathMode, CatalogManager constants - VariableResolution: allowUnqualifiedSessionTempVariableLookup gates unqualified variable access when system.session not on PATH - Single-pass resolver updates aligned with new resolution signatures Frozen path analysis for views/SQL functions comes in a follow-up PR. Part of SPARK-54810.
90d9edb to
4f300f7
Compare
|
All 13 self-review findings addressed in commit
|
cloud-fan
left a comment
There was a problem hiding this comment.
Summary
Prior state and problem. SPARK-56501 added SET PATH and stores a session path on CatalogManager, but the resolvers still consulted only the legacy single-schema resolutionSearchPath — so SET PATH was silently ignored when resolving unqualified table, function, and variable references.
Design approach. Extends the existing search-path pattern to a multi-entry path:
CatalogManager.sqlResolutionPathEntriesis the single source of ordered path entries — stored entries when PATH is enabled and set,SQLConf.defaultPathOrder(single-catalog legacy) otherwise.RelationResolutionbuildsrelationResolutionStepsfrom those entries, mapping each toSessionScopeSteporPersistentCatalogStep(prefix). Each persistent path entry is kept with its catalog/namespace so lookup qualifies the object name under that entry, not just the session's current namespace.FunctionResolutionexposessqlResolutionPathEntriesForAnalysisand now owns procedure resolution (resolveProceduremoved here fromAnalyzer).VariableResolutionandResolveCatalogsgate unqualified session-variable access onsystem.sessionbeing present in the path (viaCatalogManager.sessionScopeUnqualifiedAllowed).CheckAnalysisformatsTABLE_OR_VIEW_NOT_FOUNDerrors with the right scope (DDL / query-like / temp-view-only) through a new explicit enum (UnresolvedTableOrViewSearchPathMode) instead of inferring fromcommandName.catalogPathForErrorconsultsAnalysisContext.catalogAndNamespaceso view-body errors reflect the view's defining catalog.
Key design decisions.
- Path data lives on CatalogManager.
sqlResolutionPathEntriescentralizes path data + the path-disabled fallback so bothRelationResolutionandFunctionResolutionconsume a single ordered list. - Frozen path is deferred.
AnalysisContext.resolutionPathEntriesexists but is never populated in this PR. Until the follow-up wires it, view bodies + PATH-enabled use the session's current catalog/namespace to expandCurrentSchemaEntrymarkers, not the view's defining context. - Analyzer-level conf isolation.
Analyzergains asessionConf: Option[SQLConf]parameter; when set,executeAndCheck/executeSameContextwrap withSQLConf.withExistingConf(c)so analysis runs against an isolated conf even whenSQLConf.getpoints to a shared session.
Implementation sketch. Path data: CatalogManager. Strategy layer: RelationResolution, FunctionResolution. Iterative analyzer: Analyzer (path-aware errors, procedure resolution moved out). Single-pass: Resolver, HybridAnalyzer, NameScope, ResolverGuard. Variable gating: VariableResolution, ResolveCatalogs, ResolveSetVariable. TableLookupCacheSuite / AlignAssignmentsSuiteBase mock the new CatalogManager methods; SetPathSuite adds three end-to-end path-resolution tests; ProtoToParsedPlanTestSuite uses sessionConf to isolate from shared session settings.
Cross-cutting notes
sessionConfconstructor injection vs. the thread-local conf. OnceexecuteAndCheck/executeSameContextwrap analysis inSQLConf.withExistingConf(sessionConf.get), theconfparameters threaded throughRelationResolution/FunctionResolution/Resolverand theRelationResolution.confoverride are reading the same valueSQLConf.getwould return inside that scope. This is essentially the question @vladimirg-db raised onHybridAnalyzer.scala:381; worth resolving that thread before the dual mechanism settles in. The duplicatedsessionConf match { case Some(c) => withExistingConf(c) { ... } }pattern inexecuteAndCheck(twice) andexecuteSameContextwould also collapse into one helper if the constructor params go away.- Test coverage gaps. The new SetPathSuite tests cover table and function path resolution and one not-found case. Missing: variable lookup gated by
system.session(VariableResolution.allowUnqualifiedSessionTempVariableLookup), procedure resolution via PATH (especially the not-found path — see inline atFunctionResolution.scala), and view-body resolution under PATH given the mid-state asymmetry below. - View body + PATH-enabled mid-state. Inline comment on
RelationResolution.scala— until the follow-up PR pins the frozen path, users with PATH set see view bodies resolve unqualified names against the session's current schema, not the view's defining schema. Worth either guarding or calling out explicitly.
Align procedure PATH iteration with table/function behavior, remove dead search-path mode, and clean up analyzer path/conf wiring to match the intended PR4 design.
|
Addressed the latest review threads in commit
Focused validation run (passed):
I also replied inline on each outstanding thread with the corresponding change details. |
Avoid overriding explicit nested SQLConf scopes while running with analyzer sessionConf so persisted view/UDF settings (including ANSI mode) remain effective during analysis.
Create an empty commit to retrigger GitHub Actions checks.
…h mode Adjust DescribeTableParserSuite to expect QueryLike mode for DESCRIBE TABLE unresolved relations and keep parser plan assertions aligned with current parser behavior.
… thread-local Per review feedback (vladimirg-db): SQLConf is a thread-local, and view/UDF resolution explicitly nests new SQLConfs onto that thread-local via SQLConf.withExistingConf(...). Capturing a conf as a constructor field on Resolver / FunctionResolution / RelationResolution makes those classes read a stale conf during nested view-body / SQL-function resolution. This removes the captured conf: - FunctionResolution: drop conf constructor param; mix in SQLConfHelper so conf.pathEnabled / conf.prioritizeSystemCatalog read SQLConf.get (live). - RelationResolution: drop sessionConf constructor param and the override def conf; SQLConfHelper.conf = SQLConf.get covers all reads. - Resolver: drop conf constructor param; replace conf.maxToStringFields call sites with SQLConf.get.maxToStringFields. - Analyzer: stop threading sessionConf / resolutionConf into the resolution objects' constructors. Keep Analyzer.sessionConf and the SQLConf.withExistingConf(c) wrap in executeAndCheck/executeSameContext -- that is the legitimate mechanism that establishes the active conf for the whole analyzer run; nested view/UDF blocks correctly push their own conf on top. - TimezoneAwareExpressionResolverSuite: drop the now-removed third arg to new FunctionResolution(...).
|
Addressed @vladimirg-db's review threads in commit
Agreed -- the captured-conf pattern is wrong. View/UDF resolution explicitly nests a new Changes in
Diff: 5 files, 22 lines removed, 10 added. Local compile/test was not possible in my dev environment due to a Maven proxy gap on the new |
cloud-fan
left a comment
There was a problem hiding this comment.
Re-review against 4edb448dbf8. 5 of 6 prior findings addressed (procedure PATH iteration, dead TempViewOnly enum, scaladoc references, catalogPathForError consolidation, plus the 99415fd4d5e follow-ups), 1 acknowledged with a code comment (view-body CurrentSchemaEntry expansion mid-state, deferred to follow-up PR per design). 2 new findings, both late catches — pre-existing in 4f300f7 and missed in my prior round; not introduced by the new commits. 1 remaining test gap from the prior round's summary (variable-resolution PATH coverage — DECLARE / SET VAR with unqualified name when system.session is/isn't on the path).
The latest 4edb448dbf8 (drop captured SQLConf, use thread-local) addresses @vladimirg-db's staleness concern: with Analyzer.executeAndCheck / executeSameContext wrapping analysis in SQLConf.withExistingConf(sessionConf) and executeSameContext's case Some(c) if SQLConf.get ne c => super.execute(plan) guard preserving nested overrides, view/UDF withExistingConf(viewConf) { ... } scopes are now correctly observed by RelationResolution / FunctionResolution / Resolver. Sound.
LGTM with the two minor follow-up notes inline.
…ntext Use AnalysisContext-aware catalog path in single-pass unresolved relation errors and reuse function lookup path entries for unresolved routine errors, keeping error search paths consistent with actual lookup semantics.
|
Could you please double-check the master status, @gengliangwang ? It's just a request because I hope this commit was not affected by the previous reverting commit. |
|
I saw the CI passed in https://github.com/srielau/spark/runs/73431382331. I will watch the master status. |
|
Thank you so much! |
Ur, @gengliangwang , it didn't passed.
|
|
The master branch was broken at |
|
Tests passed in https://github.com/apache/spark/actions/runs/25068992578 |



What changes were proposed in this pull request?
Switch the resolution engine from the legacy single-schema
resolutionSearchPathtosqlResolutionPathEntriesonCatalogManager, so thatSET PATHactually affects how unqualified table names, function names, and variables are resolved.CatalogManager (
CatalogManager.scala):sqlResolutionPathEntries: ordered path entries for resolving unqualified names. When PATH is enabled and set, uses the stored session path; otherwise falls back todefaultPathOrder(legacy).sessionScopeUnqualifiedAllowed: gates unqualified variable access whensystem.sessionis not on the PATH.Relation resolution (
RelationResolution.scala):relationResolutionEntries/relationResolutionStepsreplacerelationResolutionSearchPath.PersistentCatalogStepcarries the catalog/namespace prefix so each path entry qualifies the object name under that entry.Function resolution (
FunctionResolution.scala):sqlResolutionPathEntriesForAnalysisprovides candidates for unqualified function names.resolveProcedure) moved here fromAnalyzer.Error context (
CheckAnalysis.scala):catalogPathForErrornow consultsAnalysisContext.catalogAndNamespacewhen inside a view body, so error messages report the view's defining catalog/namespace.UnresolvedTableOrViewSearchPathModeenum controls DDL vs query-like error paths.CatalogManagerconstants instead of string literals.RelationChangeserrors now include the search path.Variable resolution (
VariableResolution.scala+ callers):allowUnqualifiedSessionTempVariableLookupgates unqualified variable access whensystem.sessionis not on the PATH.Analyzer (
Analyzer.scala):sessionConfconstructor parameter for isolated analysis (e.g. Connect).resolutionConffor path-based resolution.AnalysisContext.resolutionPathEntriesfield (initiallyNone; frozen path wiring comes in follow-up PR).Single-pass resolver:
Resolver,HybridAnalyzer,NameScope,ResolverGuardaligned with new resolution signatures.Frozen path analysis for views/SQL functions (using stored path during analysis) comes in a follow-up PR. This PR only wires the resolution engine to use the live session path.
Why are the changes needed?
SET PATH(merged in SPARK-56501) stores the session path but the resolvers still used the legacy single-schema path. This PR completes the wiring so PATH actually affects resolution.Part of SPARK-54810.
Does this PR introduce any user-facing change?
Yes. With
spark.sql.path.enabled = trueandSET PATH, unqualified table names, function names, and variable references now resolve according to the stored path order.How was this patch tested?
CI.
ProtoToParsedPlanTestSuiteupdated with analyzer isolation conf. Connect.explaingolden files updated.PlanResolutionSuite,NameScopeSuite,TimezoneAwareExpressionResolverSuiteupdated for new constructor signatures.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.6