Skip to content

refactor for SPARK-34079-multi-column-scalar-subquery#4

Closed
attilapiros wants to merge 1 commit into
peter-toth:SPARK-34079-multi-column-scalar-subqueryfrom
attilapiros:SPARK-34079-refactor
Closed

refactor for SPARK-34079-multi-column-scalar-subquery#4
attilapiros wants to merge 1 commit into
peter-toth:SPARK-34079-multi-column-scalar-subqueryfrom
attilapiros:SPARK-34079-refactor

Conversation

@attilapiros

Copy link
Copy Markdown

No description provided.

@github-actions github-actions Bot added the SQL label Nov 16, 2021
peter-toth pushed a commit that referenced this pull request Jan 17, 2023
### What changes were proposed in this pull request?
This PR introduces sasl retry count in RetryingBlockTransferor.

### Why are the changes needed?
Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean variable wouldn't cover the following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying SaslTimeoutException, we should keep a counter for SaslTimeoutException retries and subtract the value of this counter from retryCount.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
New test is added, courtesy of Mridul.

Closes apache#39611 from tedyu/sasl-cnt.

Authored-by: Ted Yu <yuzhihong@gmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
peter-toth pushed a commit that referenced this pull request Apr 11, 2023
…edExpression()

### What changes were proposed in this pull request?

In `EquivalentExpressions.addExpr()`, add a guard `supportedExpression()` to make it consistent with `addExprTree()` and `getExprState()`.

### Why are the changes needed?

This fixes a regression caused by apache#39010 which added the `supportedExpression()` to `addExprTree()` and `getExprState()` but not `addExpr()`.

One example of a use case affected by the inconsistency is the `PhysicalAggregation` pattern in physical planning. There, it calls `addExpr()` to deduplicate the aggregate expressions, and then calls `getExprState()` to deduplicate the result expressions. Guarding inconsistently will cause the aggregate and result expressions go out of sync, eventually resulting in query execution error (or whole-stage codegen error).

### Does this PR introduce _any_ user-facing change?

This fixes a regression affecting Spark 3.3.2+, where it may manifest as an error running aggregate operators with higher-order functions.

Example running the SQL command:
```sql
select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) from range(2)
```
example error message before the fix:
```
java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, false)))#3]
```
after the fix this error is gone.

### How was this patch tested?

Added new test cases to `SubexpressionEliminationSuite` for the immediate issue, and to `DataFrameAggregateSuite` for an example of user-visible symptom.

Closes apache#40473 from rednaxelafx/spark-42851.

Authored-by: Kris Mok <kris.mok@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
peter-toth pushed a commit that referenced this pull request Aug 22, 2023
…edExpression()

### What changes were proposed in this pull request?

In `EquivalentExpressions.addExpr()`, add a guard `supportedExpression()` to make it consistent with `addExprTree()` and `getExprState()`.

### Why are the changes needed?

This fixes a regression caused by apache#39010 which added the `supportedExpression()` to `addExprTree()` and `getExprState()` but not `addExpr()`.

One example of a use case affected by the inconsistency is the `PhysicalAggregation` pattern in physical planning. There, it calls `addExpr()` to deduplicate the aggregate expressions, and then calls `getExprState()` to deduplicate the result expressions. Guarding inconsistently will cause the aggregate and result expressions go out of sync, eventually resulting in query execution error (or whole-stage codegen error).

### Does this PR introduce _any_ user-facing change?

This fixes a regression affecting Spark 3.3.2+, where it may manifest as an error running aggregate operators with higher-order functions.

Example running the SQL command:
```sql
select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) from range(2)
```
example error message before the fix:
```
java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, false)))#3]
```
after the fix this error is gone.

### How was this patch tested?

Added new test cases to `SubexpressionEliminationSuite` for the immediate issue, and to `DataFrameAggregateSuite` for an example of user-visible symptom.

Closes apache#40473 from rednaxelafx/spark-42851.

Authored-by: Kris Mok <kris.mok@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit ef0a76e)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
peter-toth pushed a commit that referenced this pull request Aug 22, 2023
### What changes were proposed in this pull request? This PR introduces sasl retry count in RetryingBlockTransferor.

### Why are the changes needed?
Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean variable wouldn't cover the following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of retryCount), the retryCount would be cleared at step #4. Since the intention of saslTimeoutSeen is to undo the increment due to retrying SaslTimeoutException, we should keep a counter for SaslTimeoutException retries and subtract the value of this counter from retryCount.

### Does this PR introduce _any_ user-facing change? No

### How was this patch tested?
New test is added, courtesy of Mridul.

Closes apache#39611 from tedyu/sasl-cnt.

Authored-by: Ted Yu <yuzhihonggmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>

Closes apache#39709 from akpatnam25/SPARK-42090-backport-3.3.

Authored-by: Ted Yu <yuzhihong@gmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
peter-toth pushed a commit that referenced this pull request Apr 23, 2024
### What changes were proposed in this pull request?

In the `Window` node, both `partitionSpec` and `orderSpec` must be orderable, but the current type check only verifies `orderSpec` is orderable. This can cause an error in later optimizing phases.

Given a query:

```
with t as (select id, map(id, id) as m from range(0, 10))
select rank() over (partition by m order by id) from t
```

Before the PR, it fails with an `INTERNAL_ERROR`:

```
org.apache.spark.SparkException: [INTERNAL_ERROR] grouping/join/window partition keys cannot be map type. SQLSTATE: XX000
at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
at org.apache.spark.SparkException$.internalError(SparkException.scala:96)
at org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers$.needNormalize(NormalizeFloatingNumbers.scala:103)
at org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers$.org$apache$spark$sql$catalyst$optimizer$NormalizeFloatingNumbers$$needNormalize(NormalizeFloatingNumbers.scala:94)
...
```

After the PR, it fails with a `EXPRESSION_TYPE_IS_NOT_ORDERABLE`, which is expected:

```
  org.apache.spark.sql.catalyst.ExtendedAnalysisException: [EXPRESSION_TYPE_IS_NOT_ORDERABLE] Column expression "m" cannot be sorted because its type "MAP<BIGINT, BIGINT>" is not orderable. SQLSTATE: 42822; line 2 pos 53;
Project [RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4]
+- Project [id#1L, m#0, RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4, RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4]
   +- Window [rank(id#1L) windowspecdefinition(m#0, id#1L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4], [m#0], [id#1L ASC NULLS FIRST]
      +- Project [id#1L, m#0]
         +- SubqueryAlias t
            +- SubqueryAlias t
               +- Project [id#1L, map(id#1L, id#1L) AS m#0]
                  +- Range (0, 10, step=1, splits=None)
  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52)
...
```

### How was this patch tested?

Unit test.

Closes apache#45730 from chenhao-db/SPARK-47572.

Authored-by: Chenhao Li <chenhao.li@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
peter-toth pushed a commit that referenced this pull request Jun 28, 2024
… throw internal error

### What changes were proposed in this pull request?

This PR fixes the error messages and classes when Python UDFs are used in higher order functions.

### Why are the changes needed?

To show the proper user-facing exceptions with error classes.

### Does this PR introduce _any_ user-facing change?

Yes, previously it threw internal error such as:

```python
from pyspark.sql.functions import transform, udf, col, array
spark.range(1).select(transform(array("id"), lambda x: udf(lambda y: y)(x))).collect()
```

Before:

```
py4j.protocol.Py4JJavaError: An error occurred while calling o74.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 0.0 failed 1 times, most recent failure: Lost task 15.0 in stage 0.0 (TID 15) (ip-192-168-123-103.ap-northeast-2.compute.internal executor driver): org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot evaluate expression: <lambda>(lambda x_0#3L)#2 SQLSTATE: XX000
	at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
	at org.apache.spark.SparkException$.internalError(SparkException.scala:96)
```

After:

```
pyspark.errors.exceptions.captured.AnalysisException: [INVALID_LAMBDA_FUNCTION_CALL.UNEVALUABLE] Invalid lambda function call. Python UDFs should be used in a lambda function at a higher order function. However, "<lambda>(lambda x_0#3L)" was a Python UDF. SQLSTATE: 42K0D;
Project [transform(array(id#0L), lambdafunction(<lambda>(lambda x_0#3L)#2, lambda x_0#3L, false)) AS transform(array(id), lambdafunction(<lambda>(lambda x_0#3L), namedlambdavariable()))#4]
+- Range (0, 1, step=1, splits=Some(16))
```

### How was this patch tested?

Unittest was added

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47079 from HyukjinKwon/SPARK-48706.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
peter-toth added a commit that referenced this pull request May 3, 2026
### What changes were proposed in this pull request?

Four bug fixes and two small cleanups in `PlanMerger`:

**Bug fixes** in `PlanMerger.scala`:

1. **Tagged `(Filter, Filter)` reuse preserves `mergedChild`'s appended columns.** When the reuse check finds an existing `propagatedFilter` alias, the branch now rebuilds the Filter over `mergedChild` (via `cp.withNewChildren(Seq(mergedChild))`) instead of returning `cp` unchanged. If the recursion extended `cp.child`'s output with new columns (e.g. a computed `d = a + b` from a user Project below the Filter), returning `cp` would drop those columns while `npMapping` still pointed into them, leaving the enclosing `Aggregate` with unresolved references.

2. **`(np: Filter, cp)` create-new does not re-append `cpFilter`.** `cpFilter`, when set, was produced by a deeper `(np, cp: Filter)` (or `(Join, Join)` pass-through) and is already part of `mergedChild`'s output. Appending it a second time via `++ cpFilter.toSeq` duplicated the attribute in the outer Project's projectList.

3. **`(np, cp: Filter)` create-new does not re-append `npFilter`.** Symmetric to #2 on the np side.

4. **`(np, cp: Filter)` with a `MERGED_FILTER_TAG`-tagged `cp` drops the tagged Filter.** cp's condition is `OR(pf_0, pf_1, ...)` and cp's aggregate expressions already carry individual `FILTER (WHERE pf_i)` clauses. Synthesising a new `propagatedFilter_X = OR(pf_0, pf_1, ...)` would just add `FILTER AND(OR(...), pf_i)` wrapping upstream (simplifying to `FILTER pf_i`) plus a redundant alias in the Project. The branch now drops cp's Filter and returns `cpFilter = None` so cp's aggregates are left untouched.

**Cleanups** in `PlanMerger.merge`:

- Unify the local variable name to `newMergedPlan` across all three branches (was `newMergedPlan` in one and `newMergePlan` in the other two) -- matches the `MergedPlan` case class name.
- Replace `cache(i).merged` with `mp.merged`; `mp` and `cache(i)` are the same object inside the `collectFirst` pattern.

### Why are the changes needed?

Fix #1 is a correctness bug. Fixes #2-#4 are plan-shape bugs that produce duplicated attributes or redundant `OR`-of-propagated-filter aliases in the merged plan. The cleanups are minor readability improvements.

### Does this PR introduce _any_ user-facing change?

No. All changes are internal to the optimizer; they produce cleaner merged plans for queries that `MergeSubplans` already handled.

### How was this patch tested?

Four new tests in `MergeSubplansSuite`, one per fix:

- `(np: Filter, cp)` create-new must not re-append cpFilter into the Project -- exercises #2 via a `Join` with a `Filter` on the right child, routing a cpFilter up through `(Join, Join)` so that `mergedChild.output` already contains the attribute the branch used to re-append.
- `(np, cp: Filter)` create-new must not re-append npFilter into the Project -- exercises #3, mirror shape on the np side.
- tagged `(Filter, Filter)` reuse must keep mergedChild's appended columns -- exercises #1 with three subqueries (sq1/sq2 create the tagged structure; sq3's Filter sits above a user Project introducing `d = a + b`, so the `(Filter, Filter)` tagged recursion extends `mergedChild` with `d`).
- `(np, cp: Filter)` drops a tagged cp Filter without synthesising a redundant alias -- exercises #4 with three subqueries (sq1/sq2 create the tagged structure; sq3 has no filter).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7
peter-toth added a commit that referenced this pull request May 4, 2026
### What changes were proposed in this pull request?

Four bug fixes and two small cleanups in `PlanMerger`:

**Bug fixes** in `PlanMerger.scala`:

1. **Tagged `(Filter, Filter)` reuse preserves `mergedChild`'s appended columns.** When the reuse check finds an existing `propagatedFilter` alias, the branch now rebuilds the Filter over `mergedChild` (via `cp.withNewChildren(Seq(mergedChild))`) instead of returning `cp` unchanged. If the recursion extended `cp.child`'s output with new columns (e.g. a computed `d = a + b` from a user Project below the Filter), returning `cp` would drop those columns while `npMapping` still pointed into them, leaving the enclosing `Aggregate` with unresolved references.

2. **`(np: Filter, cp)` does not duplicate a `cpFilter` already present in `mergedChild`.** `cpFilter`, when set, was produced by a deeper `(np, cp: Filter)` (or `(Join, Join)` pass-through) and is already part of `mergedChild`'s output. Appending it a second time via `++ cpFilter.toSeq` duplicated the attribute in the outer Project's projectList.

3. **`(np, cp: Filter)` does not duplicate an `npFilter` already present in `mergedChild`.** Symmetric to #2 on the np side.

4. **`(np, cp: Filter)` drops a `MERGED_FILTER_TAG`-tagged `cp` Filter without synthesising a redundant alias.** cp's condition is `OR(pf_0, pf_1, ...)` and cp's aggregate expressions already carry individual `FILTER (WHERE pf_i)` clauses. Synthesising a new `propagatedFilter_X = OR(pf_0, pf_1, ...)` would just add `FILTER AND(OR(...), pf_i)` wrapping upstream (simplifying to `FILTER pf_i`) plus a redundant alias in the Project. The branch now drops cp's Filter and returns `cpFilter = None` so cp's aggregates are left untouched.

**Cleanups** in `PlanMerger.merge`:

- Unify the local variable name to `newMergedPlan` across all three branches (was `newMergedPlan` in one and `newMergePlan` in the other two) -- matches the `MergedPlan` case class name.
- Replace `cache(i).merged` with `mp.merged`; `mp` and `cache(i)` are the same object inside the `collectFirst` pattern.

### Why are the changes needed?

Fix #1 is a correctness bug. Fixes #2-#4 are plan-shape bugs that produce duplicated attributes or redundant `OR`-of-propagated-filter aliases in the merged plan. The cleanups are minor readability improvements.

### Does this PR introduce _any_ user-facing change?

No. All changes are internal to the optimizer; they produce cleaner merged plans for queries that `MergeSubplans` already handled.

### How was this patch tested?

Four new tests in `MergeSubplansSuite`, one per fix:

- `(np: Filter, cp)` does not duplicate a cpFilter already present in mergedChild -- exercises #2 via a `Join` with a `Filter` on the right child, routing a cpFilter up through `(Join, Join)` so that `mergedChild.output` already contains the attribute the branch used to re-append.
- `(np, cp: Filter)` does not duplicate an npFilter already present in mergedChild -- exercises #3, mirror shape on the np side.
- tagged `(Filter, Filter)` reuse must keep mergedChild's appended columns -- exercises #1 with three subqueries (sq1/sq2 create the tagged structure; sq3's Filter sits above a user Project introducing `d = a + b`, so the `(Filter, Filter)` tagged recursion extends `mergedChild` with `d`).
- `(np, cp: Filter)` drops a tagged cp Filter without synthesising a redundant alias -- exercises #4 with three subqueries (sq1/sq2 create the tagged structure; sq3 has no filter).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7
peter-toth pushed a commit that referenced this pull request May 14, 2026
### What changes were proposed in this pull request?

Address the open follow-ups from [SPARK-56681](https://issues.apache.org/jira/browse/SPARK-56681) (umbrella for PATH / SPARK-56605 cleanup) in a single cleanup PR. Items #1 and #2 were already wired by SPARK-56639; this PR covers the remainder.

| # | Item | Resolution |
|---|---|---|
| #1 | `FunctionResolution.resolveProcedure` was dead code | Already wired by SPARK-56639 (no action). |
| #2 | Frozen view / SQL-function PATH wiring unfinished | Already done by SPARK-56639 (no action). |
| #3 | `AnalysisContext.resolutionPathEntries` threadlocal | Audit only: confirmed `withNewAnalysisContext` / `reset()` correctly clear it. Full removal needs a coordinated refactor to plumb the path through `RelationResolution` / `FunctionResolution` method calls; flagged as a follow-up. |
| #4 | `Analyzer.executeAndCheck` clobbers outer `SQLConf.withExistingConf` | Extracted `runWithSessionConf` helper, added `SQLConf.getExistingConfIfSet`. `executeAndCheck` and `executeSameContext` now share one path that yields to any outer scope. |
| #5 | `VariableResolution.allowUnqualifiedSessionTempVariableLookup` force-loads default catalog | Replaced the hot-path catalog read with `CatalogManager.isSystemSessionOnPath`, which inspects stored session-path entries directly. No catalog load on column resolution. |
| apache#6 | `DROP VARIABLE` PATH gate asymmetric with `DECLARE` / `CREATE` | Removed the gate. DDL on session variables (`DECLARE` / `CREATE` / `DROP`) always targets `system.session` directly; only DML (`SET VAR`, `SELECT x`) goes through PATH. |
| apache#7 | `lookupFunctionType` exception swallow too broad | Narrowed from `NonFatal` to the explicit not-found list (`NoSuchFunctionException`, `NoSuchNamespaceException`, `CatalogNotFoundException`, `FORBIDDEN_OPERATION`). Other exceptions propagate. |
| apache#8 | `lookupFunctionType` fan-out had wasteful `system.*` candidates | Filtered them out — `system.session`, `system.builtin`, `system.ai` are already resolved earlier in the same method. |
| apache#9 | Three near-duplicate path-resolution helpers | Lifted into `CatalogManager.resolutionPathEntriesForAnalysis(pinnedEntries, viewCatalogAndNamespace)`. Relation, routine, and procedure resolution all route through it. |
| apache#10 | Tests for the new error paths and gates | Added a DECLARE / SET VAR / DROP cycle test under non-default PATH and a struct-variable field-vs-qualified ambiguity test in `sql-session-variables.sql`. |
| apache#11 | `ProtoToParsedPlanTestSuite.analyzerIsolationConf` was a bare `SQLConf` | Clone `spark.sessionState.conf` and only override `PATH_ENABLED=false`, so all `sparkConf` overrides (ANSI, alias config, ...) propagate automatically. |
| Bonus | `ResolveSetVariable` hardcoded `SYSTEM.SESSION` regardless of actual PATH | `unresolvedVariableError` now takes `Seq[Seq[String]]` path entries with **required** `Origin` (no overloads). DML lookup failures (`SET VAR`, `FETCH ... INTO`) report the full SQL path as a bracketed list, byte-for-byte consistent with `UNRESOLVED_ROUTINE` and `TABLE_OR_VIEW_NOT_FOUND`. DDL name validation in `ResolveCatalogs` continues to report `[system.session]` since PATH does not apply there. Origin is plumbed through `VariableManager.set` so all error sites carry a `queryContext` pointing at the offending variable identifier (parser opt-ins via `withOrigin(identifierReference)` so the highlight is the variable name, not the whole statement). |

### Why are the changes needed?

These are the cleanup items called out on SPARK-56681 from the post-merge source review of SPARK-56605. They eliminate dead code paths, plug user-visible bugs (force-loading a misconfigured default catalog on column resolution; clobbering pinned session configs; swallowing real catalog errors as `UNRESOLVED_ROUTINE`), remove the asymmetry between DDL and DML on session variables, and make `UNRESOLVED_VARIABLE` self-consistent with the other "not found" errors.

### Does this PR introduce _any_ user-facing change?

Yes.

- **`UNRESOLVED_VARIABLE.searchPath`** is now rendered as a bracketed list. For DML lookups (`SET VAR`, `FETCH ... INTO`), the list reflects the actual SQL PATH that was consulted instead of a hardcoded `SYSTEM.SESSION`. For DDL name validation (`DECLARE` / `DROP` with a non-session namespace), the list is `[`` `system`.`session` ``]` since PATH does not apply.
- **`UNRESOLVED_VARIABLE`** now always carries a `queryContext` that highlights just the offending variable identifier (e.g. `"builtin.var1"`, `"ses.var1"`), not the whole `DECLARE` / `SET VAR` statement.
- **`DROP TEMPORARY VARIABLE`** no longer raises `UNRESOLVED_VARIABLE` when the SQL PATH does not contain `system.session`. DDL on session variables ignores PATH, matching the existing behaviour of `DECLARE OR REPLACE VARIABLE`.
- **`lookupFunctionType`** no longer swallows non–`NotFound` errors. A catalog reporting `PERMISSION_DENIED` (or similar) for a function lookup now propagates instead of silently producing `UNRESOLVED_ROUTINE`.

### How was this patch tested?

- Added `sql-session-variables.sql` regression test for the struct-variable field-vs-qualified ambiguity (`DECLARE VARIABLE session STRUCT<a INT>` → `SELECT session.a` succeeds → `DROP` → `SELECT session.a` falls through to `UNRESOLVED_COLUMN`).
- Updated `SetPathSuite`: DECLARE / SET VAR / DROP cycle under a non-default PATH; bonus test asserts the actual rendered search path and the variable-identifier `queryContext`.
- Updated `SqlScriptingExecutionSuite` for the new bracketed `searchPath` and identifier-pinned `queryContext`.
- Regenerated `sql-session-variables.sql.out` for the new error shape.
- Added `resolutionPathEntriesForAnalysis` stubs to mocked `CatalogManager` instances in `PlanResolutionSuite`, `AlignAssignmentsSuiteBase`, and `TableLookupCacheSuite`.
- Ran focused suites locally; all pass:
  - `build/sbt 'sql/testOnly *SetPathSuite *SqlScriptingExecutionSuite *ExecuteImmediateEndToEndSuite'`
  - `build/sbt 'sql/testOnly *SimpleSQLViewSuite *SQLFunctionSuite'`
  - `build/sbt 'sql/testOnly *PlanResolutionSuite *UpdateTableAlignAssignmentsSuite *MergeIntoTableAlignAssignmentsSuite'`
  - `build/sbt 'catalyst/testOnly *TableLookupCacheSuite *AnalysisSuite *AnalysisErrorSuite *LookupFunctionsSuite'`
  - `build/sbt 'sql/testOnly *FunctionQualificationSuite *RelationQualificationSuite *DataSourceV2FunctionSuite'`
  - `build/sbt 'sql/testOnly *SQLQuerySuite'`
  - `build/sbt 'connect/testOnly *ProtoToParsedPlanTestSuite'`
  - `build/sbt 'sql/testOnly *SQLQueryTestSuite -- -z sql-session-variables.sql'`
  - Full `org.apache.spark.sql.catalyst.analysis.*`, `org.apache.spark.sql.catalyst.parser.*`, and `org.apache.spark.sql.analysis.resolver.*` suites.
- `scalastyle` and `scalafmt` clean across catalyst, sql, and connect modules.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor Claude Opus 4.7

Closes apache#55647 from srielau/SPARK-56681-patch-clean-up.

Authored-by: Serge Rielau <serge@rielau.com>
Signed-off-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant