remove ml.LabeledPoint from PySpark and annotate ml.LabeledPoint#2
Merged
Conversation
ghost
pushed a commit
that referenced
this pull request
Jan 18, 2018
## What changes were proposed in this pull request? There were two related fixes regarding `from_json`, `get_json_object` and `json_tuple` ([Fix #1](apache@c8803c0), [Fix #2](apache@86174ea)), but they weren't comprehensive it seems. I wanted to extend those fixes to all the parsers, and add tests for each case. ## How was this patch tested? Regression tests Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#20302 from brkyvz/json-invfix.
dbtsai
pushed a commit
that referenced
this pull request
Nov 9, 2019
### What changes were proposed in this pull request? `org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite` failed lately. After had a look at the logs it just shows the following fact without any details: ``` Caused by: sbt.ForkMain$ForkError: sun.security.krb5.KrbException: Server not found in Kerberos database (7) - Server not found in Kerberos database ``` Since the issue is intermittent and not able to reproduce it we should add more debug information and wait for reproduction with the extended logs. ### Why are the changes needed? Failing test doesn't give enough debug information. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? I've started the test manually and checked that such additional debug messages show up: ``` >>> KrbApReq: APOptions are 00000000 00000000 00000000 00000000 >>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType Looking for keys for: kafka/localhostEXAMPLE.COM Added key: 17version: 0 Added key: 23version: 0 Added key: 16version: 0 Found unsupported keytype (3) for kafka/localhostEXAMPLE.COM >>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType Using builtin default etypes for permitted_enctypes default etypes for permitted_enctypes: 17 16 23. >>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType MemoryCache: add 1571936500/174770/16C565221B70AAB2BEFE31A83D13A2F4/client/localhostEXAMPLE.COM to client/localhostEXAMPLE.COM|kafka/localhostEXAMPLE.COM MemoryCache: Existing AuthList: #3: 1571936493/200803/8CD70D280B0862C5DA1FF901ECAD39FE/client/localhostEXAMPLE.COM #2: 1571936499/985009/BAD33290D079DD4E3579A8686EC326B7/client/localhostEXAMPLE.COM #1: 1571936499/995208/B76B9D78A9BE283AC78340157107FD40/client/localhostEXAMPLE.COM ``` Closes apache#26252 from gaborgsomogyi/SPARK-29580. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dbtsai
pushed a commit
that referenced
this pull request
Mar 11, 2020
### What changes were proposed in this pull request? fix the error caused by interval output in ExtractBenchmark ### Why are the changes needed? fix a bug in the test ```scala [info] Running case: cast to interval [error] Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot use interval type in the table schema.;; [error] OverwriteByExpression RelationV2[] noop-table, true, true [error] +- Project [(subtractdates(cast(cast(id#0L as timestamp) as date), -719162) + subtracttimestamps(cast(id#0L as timestamp), -30610249419876544)) AS ((CAST(CAST(id AS TIMESTAMP) AS DATE) - DATE '0001-01-01') + (CAST(id AS TIMESTAMP) - TIMESTAMP '1000-01-01 01:02:03.123456'))#2] [error] +- Range (1262304000, 1272304000, step=1, splits=Some(1)) [error] [error] at org.apache.spark.sql.catalyst.util.TypeUtils$.failWithIntervalType(TypeUtils.scala:106) [error] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$25(CheckAnalysis.scala:389) [error] at org.a ``` ### Does this PR introduce any user-facing change? no ### How was this patch tested? re-run benchmark Closes apache#27867 from yaooqinn/SPARK-31111. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
dbtsai
pushed a commit
that referenced
this pull request
Jun 16, 2020
…chmarks ### What changes were proposed in this pull request? Replace `CAST(... AS TIMESTAMP` by `TIMESTAMP_SECONDS` in the following benchmarks: - ExtractBenchmark - DateTimeBenchmark - FilterPushdownBenchmark - InExpressionBenchmark ### Why are the changes needed? The benchmarks fail w/o the changes: ``` [info] Running benchmark: datetime +/- interval [info] Running case: date + interval(m) [error] Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(`id` AS TIMESTAMP)' due to data type mismatch: cannot cast bigint to timestamp,you can enable the casting by setting spark.sql.legacy.allowCastNumericToTimestamp to true,but we strongly recommend using function TIMESTAMP_SECONDS/TIMESTAMP_MILLIS/TIMESTAMP_MICROS instead.; line 1 pos 5; [error] 'Project [(cast(cast(id#0L as timestamp) as date) + 1 months) AS (CAST(CAST(id AS TIMESTAMP) AS DATE) + INTERVAL '1 months')#2] [error] +- Range (0, 10000000, step=1, splits=Some(1)) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the affected benchmarks. Closes apache#28843 from MaxGekk/GuoPhilipse-31710-fix-compatibility-followup. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
dbtsai
pushed a commit
that referenced
this pull request
May 21, 2026
…mands ### What changes were proposed in this pull request? Follow-up to [SPARK-52729](https://issues.apache.org/jira/browse/SPARK-52729) (which added `MetadataOnlyTable` and CREATE / ALTER VIEW … AS support for DS v2 catalogs). This PR closes out the *Remaining work* section of that PR's description, plus a few API/test cleanups. The branch is structured as **three commits**: **1. `[SPARK-56655][SQL] Rename v2 metadata-table API: MetadataOnlyTable, RelationCatalog, loadRelation`** — mechanical rename of the v2 view-API surface introduced by the parent PR: | Before | After | |---------------------|--------------------| | `MetadataOnlyTable` | `MetadataTable` | | `RelationCatalog` | `TableViewCatalog` | | `loadRelation()` | `loadTableOrView()` | | `DataSourceV2MetadataOnly{Table,View}Suite` | `DataSourceV2Metadata{Table,View}Suite` | The unrelated v2 helpers `CatalogV2Util.loadRelation` and `RelationResolution`'s private `loadRelation(V2TableReference)` predate the parent PR and are intentionally not renamed. **2. `[SPARK-56655][SQL] Implement remaining v2 view DDL and inspection commands`** — the production-side work: *New view DDL execs* (`AlterV2ViewExec.scala`): - `AlterV2ViewSetPropertiesExec`, `AlterV2ViewUnsetPropertiesExec` — merge / drop user TBLPROPERTIES on the existing view; dispatch to `ViewCatalog#replaceView`. - `AlterV2ViewSchemaBindingExec` — rewrites the schema-binding mode field; dispatch to `replaceView`. - `RenameV2ViewExec` — dispatches to a new abstract `ViewCatalog#renameView(Identifier, Identifier)` (added to `ViewCatalog`, contracted on `TableViewCatalog`). - A shared `V2ViewMetadataMutation.builderFrom(existing)` helper seeds a `ViewInfo.Builder` from the existing view so mutating execs override only the field they're changing. *New view inspection execs* (`V2ViewInspectionExecs.scala`): - `ShowCreateV2ViewExec` — reconstructs the `CREATE VIEW …` DDL from `ViewInfo`. - `ShowV2ViewPropertiesExec`, `ShowV2ViewColumnsExec`, `DescribeV2ViewExec`, `DescribeV2ViewColumnExec` — produce output rows directly from `ViewInfo`. `DESCRIBE TABLE EXTENDED` emits a v2-native `# Detailed View Information` block. *v1-parity for `SHOW TABLES` on a `TableViewCatalog`* (`ShowTablesExec.scala`): when the catalog is a `TableViewCatalog`, route through `listRelationSummaries` so views appear alongside tables. Pure `TableCatalog` catalogs continue to use `listTables` and return tables only. *All `UNSUPPORTED_FEATURE.TABLE_OPERATION` pins* from the parent PR for these commands are replaced with real strategy cases. *Architecture — single typed payload for resolved views*. To match the v2-table convention (where `ResolvedTable.table: Table` carries the resolved Table — and v1 tables appear as `V1Table` wrapping a `CatalogTable`), `ResolvedPersistentView` now carries `info: ViewInfo`: ```scala case class ResolvedPersistentView( catalog: CatalogPlugin, identifier: Identifier, info: ViewInfo) // v2 ViewInfo for v2 views; V1ViewInfo wrapping // CatalogTable for session-catalog (v1) views ``` A new `private[sql] V1ViewInfo(v1Table: CatalogTable) extends ViewInfo` exposes a v1 `CatalogTable` through the v2 `ViewInfo` surface, mirroring `V1Table` for `Table`. `ViewInfo`'s constructor relaxed from `private` to `protected` so the subclass can call it. v1-only paths (DescribeTableCommand via `ResolvedChildHelper`, the v1 DescribeRelation JSON path, ApplyDefaultCollation's AlterViewAs rewrite, CreateTableLike strategy case) recover the original `CatalogTable` via `info.asInstanceOf[V1ViewInfo].v1Table`. v2 strategy cases consume `rpv.info` directly. `ResolvedPersistentView.output` now exposes the view's schema attributes (with char/varchar normalization), so `DescribeColumn` against a v2 view survives `CheckAnalysis` — the column resolves naturally through `ResolveReferences`, and the strategy / v1 rewrite both extract `nameParts` from the resolved attribute. No new logical plan needed for DESCRIBE COLUMN; the existing one flows to the planner intact. Also folded into this commit: - `ALTER TABLE <view> RENAME TO …` is rejected with `EXPECT_TABLE_NOT_VIEW.USE_ALTER_VIEW` on the v2 path (mirrors v1 `DDLUtils.verifyAlterTableType`). **3. `[SPARK-56655][SQL][TESTS] Add per-catalog view command test triplets and fold in late prod tweaks`** — test scaffolding: Mirror the DROP TABLE test layout from `sql/core/test/.../command/{,v1/,v2/}` for every v2 view DDL / inspection command. Each command lands as: - `command/<Cmd>SuiteBase.scala` — unified tests parameterized by `$catalog` - `command/v1/<Cmd>Suite.scala` — extends Base + v1 `ViewCommandSuiteBase` (pins `$catalog` to `spark_catalog`, so the unified tests target the session catalog) - `command/v2/<Cmd>Suite.scala` — extends Base + v2 `ViewCommandSuiteBase` (pins `$catalog` to a fresh `test_view_catalog` backed by a new general-purpose `InMemoryTableViewCatalog` test fixture), plus catalog-state assertions specific to the v2 fixture Triplets cover: CREATE VIEW, ALTER VIEW … AS, ALTER VIEW SET / UNSET TBLPROPERTIES, ALTER VIEW RENAME TO, ALTER VIEW WITH SCHEMA, SHOW CREATE TABLE, SHOW TBLPROPERTIES, SHOW COLUMNS, DESCRIBE TABLE, DESCRIBE TABLE … COLUMN, DROP VIEW, SHOW VIEWS. Each Base test runs against both `spark_catalog` (v1, hits the existing v1 commands) and `test_view_catalog` (v2, hits the new execs from commit #2), giving a single source of cross-catalog behavioral parity. The pre-existing `DataSourceV2MetadataViewSuite` is trimmed: CREATE / ALTER / DROP / SHOW VIEW DDL coverage moves to the per-catalog triplets. What remains in the leaf suite is genuinely v2-specific structural coverage (view read-path, V1Table.toCatalogTable round-trip, pure-ViewCatalog read + ALTER, multi-level-namespace cyclic detection / error rendering, REFRESH / ANALYZE rejection, SHOW TABLES on a `TableViewCatalog` returning both kinds). ### Why are the changes needed? The parent PR shipped CREATE VIEW + ALTER VIEW … AS through the v2 surface but pinned the rest of the view DDL/inspection family with `UNSUPPORTED_FEATURE.TABLE_OPERATION`. Third-party v2 catalogs that host views still couldn't run `ALTER VIEW SET TBLPROPERTIES`, `ALTER VIEW … RENAME TO`, `ALTER VIEW … WITH SCHEMA <mode>`, `DESCRIBE`, `SHOW CREATE TABLE`, `SHOW TBLPROPERTIES`, `SHOW COLUMNS` against their views — full v1 parity for non-session view catalogs requires this set. The rename to `MetadataTable` / `TableViewCatalog` / `loadTableOrView` was discussed during the parent PR's review as a clarity improvement; doing it now (before the API ships in a release) avoids deprecation churn later. `ResolvedPersistentView.info: ViewInfo` (with `V1ViewInfo` for v1 views) brings v2 view command flow in line with the existing v2 table command convention (`ResolvedTable.table: Table`, with `V1Table` for v1 tables) — single typed payload, resolved at analysis time, consumed at exec time. Looking up database objects at runtime is the anti-pattern this removes. ### Does this PR introduce _any_ user-facing change? Yes for **connector developers**: - The rename is a source-incompatible change on the still-`Evolving` v2 view API surface. Connectors implementing the parent PR's `RelationCatalog` / overriding `loadRelation` / referencing `MetadataOnlyTable` need to update to `TableViewCatalog` / `loadTableOrView` / `MetadataTable`. - `ViewCatalog` gains a new abstract `renameView(oldIdent, newIdent)` method. Existing `ViewCatalog` implementations need to add it (the parent PR has not yet released, so this is still pre-release breakage). - `ViewInfo`'s constructor is now `protected` (was `private`). Existing call sites use the `ViewInfo.Builder`; only internal subclasses need to call the constructor directly. Yes for **end users on a non-session v2 view catalog**: the listed DDL / inspection commands now succeed against a `ViewCatalog` instead of erroring with `UNSUPPORTED_FEATURE.TABLE_OPERATION`. `SHOW TABLES` on a `TableViewCatalog` now returns both tables and views, matching v1 SHOW TABLES output. `ALTER TABLE <view> RENAME TO …` (wrong syntax) now returns `EXPECT_TABLE_NOT_VIEW.USE_ALTER_VIEW` instead of silently succeeding. No user-facing change on the session catalog path — those plans are still rewritten to the v1 commands by `ResolveSessionCatalog` and behave exactly as before. ### How was this patch tested? - 13 new per-catalog `*SuiteBase` triplets under `sql/core/src/test/.../command/{,v1/,v2/}` exercise each command against both a v1 (session) catalog and a fresh v2 `InMemoryTableViewCatalog` fixture. - The pre-existing `DataSourceV2MetadataViewSuite` is trimmed to v2-specific structural tests (read path, `V1Table.toCatalogTable` round-trip, pure-ViewCatalog read+ALTER, multi-level-namespace cyclic / error rendering, REFRESH/ANALYZE rejection, SHOW TABLES TableViewCatalog). - 242 view-related tests pass locally across 30 suites, plus 54/54 SimpleSQLViewSuite, plus 171/171 table-side inspection suites (verifying no regression in `ResolvedChildHelper.getTableMetadata` after the metadata→info refactor). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude (Anthropic) Closes apache#55593 from cloud-fan/v2-view-followup. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
dbtsai
pushed a commit
that referenced
this pull request
May 21, 2026
… metadata ### What changes were proposed in this pull request? Follow-up to apache#55636 addressing post-merge review comments from zikangh: 1. **Deduplicate `isCarryoverPair`.** The carry-over predicate (`_del_cnt = 1 AND _ins_cnt = 1 AND _rv_cnt = 2 AND _min_rv = _max_rv`) was duplicated between the batch path's `addCarryOverPairFilter` and the streaming path's inline filter. Extracted a shared `buildCarryOverPairPredicate` helper and call it from both. 2. **Mark the streaming row-level rewrite via attribute metadata rather than helper column name.** `UnsupportedOperationChecker` previously detected the rewrite by string-matching the `__spark_cdc_events` aggregate alias name. Switched to a metadata marker (`ResolveChangelogTable.streamingPostProcessingMarker`) attached to the alias's output attribute -- mirroring the existing `EventTimeWatermark.delayKey` and `SessionWindow.marker` patterns. The marker travels with the attribute through optimization. 3. **Expand streaming E2E coverage.** New tests in `ChangelogEndToEndSuite`: - composite rowId carry-over removal, - composite rowId update detection (different tuples kept raw), - carry-over + update detection across multiple commits, - DELETE-all-rows and UPDATE-all-rows fixtures, - append-only workload pass-through, - no-op UPDATE labeled as update (rcv differs on pre/post), - large carry-over removal (9 carry-over pairs + 1 real delete). ### Why are the changes needed? zikangh raised these on the merged PR. Bundled together so they can be reviewed and shipped as one follow-up. ### Does this PR introduce _any_ user-facing change? No. Internal refactor (#1, #2) and additional test coverage (#3). The behavior of streaming CDC reads is unchanged. ### How was this patch tested? All 157 tests pass across the four CDC suites: - `ChangelogResolutionSuite` - `ResolveChangelogTablePostProcessingSuite` - `ResolveChangelogTableStreamingPostProcessingSuite` - `ChangelogEndToEndSuite` Also confirmed: - `UnsupportedOperationsSuite` (216 tests) still passes after the marker-based detection switch. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (claude-opus-4-7) Closes apache#55653 from gengliangwang/streamingCDC-followup-zikangh. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
dbtsai
pushed a commit
that referenced
this pull request
May 21, 2026
### What changes were proposed in this pull request? Address the open follow-ups from [SPARK-56681](https://issues.apache.org/jira/browse/SPARK-56681) (umbrella for PATH / SPARK-56605 cleanup) in a single cleanup PR. Items #1 and #2 were already wired by SPARK-56639; this PR covers the remainder. | # | Item | Resolution | |---|---|---| | #1 | `FunctionResolution.resolveProcedure` was dead code | Already wired by SPARK-56639 (no action). | | #2 | Frozen view / SQL-function PATH wiring unfinished | Already done by SPARK-56639 (no action). | | #3 | `AnalysisContext.resolutionPathEntries` threadlocal | Audit only: confirmed `withNewAnalysisContext` / `reset()` correctly clear it. Full removal needs a coordinated refactor to plumb the path through `RelationResolution` / `FunctionResolution` method calls; flagged as a follow-up. | | #4 | `Analyzer.executeAndCheck` clobbers outer `SQLConf.withExistingConf` | Extracted `runWithSessionConf` helper, added `SQLConf.getExistingConfIfSet`. `executeAndCheck` and `executeSameContext` now share one path that yields to any outer scope. | | #5 | `VariableResolution.allowUnqualifiedSessionTempVariableLookup` force-loads default catalog | Replaced the hot-path catalog read with `CatalogManager.isSystemSessionOnPath`, which inspects stored session-path entries directly. No catalog load on column resolution. | | #6 | `DROP VARIABLE` PATH gate asymmetric with `DECLARE` / `CREATE` | Removed the gate. DDL on session variables (`DECLARE` / `CREATE` / `DROP`) always targets `system.session` directly; only DML (`SET VAR`, `SELECT x`) goes through PATH. | | #7 | `lookupFunctionType` exception swallow too broad | Narrowed from `NonFatal` to the explicit not-found list (`NoSuchFunctionException`, `NoSuchNamespaceException`, `CatalogNotFoundException`, `FORBIDDEN_OPERATION`). Other exceptions propagate. | | #8 | `lookupFunctionType` fan-out had wasteful `system.*` candidates | Filtered them out — `system.session`, `system.builtin`, `system.ai` are already resolved earlier in the same method. | | #9 | Three near-duplicate path-resolution helpers | Lifted into `CatalogManager.resolutionPathEntriesForAnalysis(pinnedEntries, viewCatalogAndNamespace)`. Relation, routine, and procedure resolution all route through it. | | #10 | Tests for the new error paths and gates | Added a DECLARE / SET VAR / DROP cycle test under non-default PATH and a struct-variable field-vs-qualified ambiguity test in `sql-session-variables.sql`. | | #11 | `ProtoToParsedPlanTestSuite.analyzerIsolationConf` was a bare `SQLConf` | Clone `spark.sessionState.conf` and only override `PATH_ENABLED=false`, so all `sparkConf` overrides (ANSI, alias config, ...) propagate automatically. | | Bonus | `ResolveSetVariable` hardcoded `SYSTEM.SESSION` regardless of actual PATH | `unresolvedVariableError` now takes `Seq[Seq[String]]` path entries with **required** `Origin` (no overloads). DML lookup failures (`SET VAR`, `FETCH ... INTO`) report the full SQL path as a bracketed list, byte-for-byte consistent with `UNRESOLVED_ROUTINE` and `TABLE_OR_VIEW_NOT_FOUND`. DDL name validation in `ResolveCatalogs` continues to report `[system.session]` since PATH does not apply there. Origin is plumbed through `VariableManager.set` so all error sites carry a `queryContext` pointing at the offending variable identifier (parser opt-ins via `withOrigin(identifierReference)` so the highlight is the variable name, not the whole statement). | ### Why are the changes needed? These are the cleanup items called out on SPARK-56681 from the post-merge source review of SPARK-56605. They eliminate dead code paths, plug user-visible bugs (force-loading a misconfigured default catalog on column resolution; clobbering pinned session configs; swallowing real catalog errors as `UNRESOLVED_ROUTINE`), remove the asymmetry between DDL and DML on session variables, and make `UNRESOLVED_VARIABLE` self-consistent with the other "not found" errors. ### Does this PR introduce _any_ user-facing change? Yes. - **`UNRESOLVED_VARIABLE.searchPath`** is now rendered as a bracketed list. For DML lookups (`SET VAR`, `FETCH ... INTO`), the list reflects the actual SQL PATH that was consulted instead of a hardcoded `SYSTEM.SESSION`. For DDL name validation (`DECLARE` / `DROP` with a non-session namespace), the list is `[`` `system`.`session` ``]` since PATH does not apply. - **`UNRESOLVED_VARIABLE`** now always carries a `queryContext` that highlights just the offending variable identifier (e.g. `"builtin.var1"`, `"ses.var1"`), not the whole `DECLARE` / `SET VAR` statement. - **`DROP TEMPORARY VARIABLE`** no longer raises `UNRESOLVED_VARIABLE` when the SQL PATH does not contain `system.session`. DDL on session variables ignores PATH, matching the existing behaviour of `DECLARE OR REPLACE VARIABLE`. - **`lookupFunctionType`** no longer swallows non–`NotFound` errors. A catalog reporting `PERMISSION_DENIED` (or similar) for a function lookup now propagates instead of silently producing `UNRESOLVED_ROUTINE`. ### How was this patch tested? - Added `sql-session-variables.sql` regression test for the struct-variable field-vs-qualified ambiguity (`DECLARE VARIABLE session STRUCT<a INT>` → `SELECT session.a` succeeds → `DROP` → `SELECT session.a` falls through to `UNRESOLVED_COLUMN`). - Updated `SetPathSuite`: DECLARE / SET VAR / DROP cycle under a non-default PATH; bonus test asserts the actual rendered search path and the variable-identifier `queryContext`. - Updated `SqlScriptingExecutionSuite` for the new bracketed `searchPath` and identifier-pinned `queryContext`. - Regenerated `sql-session-variables.sql.out` for the new error shape. - Added `resolutionPathEntriesForAnalysis` stubs to mocked `CatalogManager` instances in `PlanResolutionSuite`, `AlignAssignmentsSuiteBase`, and `TableLookupCacheSuite`. - Ran focused suites locally; all pass: - `build/sbt 'sql/testOnly *SetPathSuite *SqlScriptingExecutionSuite *ExecuteImmediateEndToEndSuite'` - `build/sbt 'sql/testOnly *SimpleSQLViewSuite *SQLFunctionSuite'` - `build/sbt 'sql/testOnly *PlanResolutionSuite *UpdateTableAlignAssignmentsSuite *MergeIntoTableAlignAssignmentsSuite'` - `build/sbt 'catalyst/testOnly *TableLookupCacheSuite *AnalysisSuite *AnalysisErrorSuite *LookupFunctionsSuite'` - `build/sbt 'sql/testOnly *FunctionQualificationSuite *RelationQualificationSuite *DataSourceV2FunctionSuite'` - `build/sbt 'sql/testOnly *SQLQuerySuite'` - `build/sbt 'connect/testOnly *ProtoToParsedPlanTestSuite'` - `build/sbt 'sql/testOnly *SQLQueryTestSuite -- -z sql-session-variables.sql'` - Full `org.apache.spark.sql.catalyst.analysis.*`, `org.apache.spark.sql.catalyst.parser.*`, and `org.apache.spark.sql.analysis.resolver.*` suites. - `scalastyle` and `scalafmt` clean across catalyst, sql, and connect modules. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor Claude Opus 4.7 Closes apache#55647 from srielau/SPARK-56681-patch-clean-up. Authored-by: Serge Rielau <serge@rielau.com> Signed-off-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
dbtsai
added a commit
that referenced
this pull request
May 28, 2026
…ntext, fix HiveContext bypass, reorganize tests - Remove public `getOrCreate` from Connect SQLContext; internal dispatch uses `_get_or_create_from_session` only (fixes Finding #1 / #4) - Fix HiveContext bypass in classic dispatch: route getOrCreate to the Connect counterpart by class name so ConnectHiveContext._from_session raises as expected (fixes Finding #2) - Fix newSession() docstring to accurately describe cloneSession() semantics (fixes Finding #3) - Fix docstring nits: missing article, list/tuple, inferring the schema, table names as strings, streams wording - Add comment explaining catalog.listTables() over SHOW TABLES - Reorganize tests: add test_sql_context.py with mixin + classic runner, test_parity_sql_context.py for Connect parity, slim test_connect_context.py to Connect-specific tests only Co-authored-by: DB Tsai <db.tsai@databricks.com>
dbtsai
added a commit
that referenced
this pull request
Jun 10, 2026
…ntext, fix HiveContext bypass, reorganize tests - Remove public `getOrCreate` from Connect SQLContext; internal dispatch uses `_get_or_create_from_session` only (fixes Finding #1 / #4) - Fix HiveContext bypass in classic dispatch: route getOrCreate to the Connect counterpart by class name so ConnectHiveContext._from_session raises as expected (fixes Finding #2) - Fix newSession() docstring to accurately describe cloneSession() semantics (fixes Finding #3) - Fix docstring nits: missing article, list/tuple, inferring the schema, table names as strings, streams wording - Add comment explaining catalog.listTables() over SHOW TABLES - Reorganize tests: add test_sql_context.py with mixin + classic runner, test_parity_sql_context.py for Connect parity, slim test_connect_context.py to Connect-specific tests only Co-authored-by: DB Tsai <db.tsai@databricks.com>
dbtsai
added a commit
that referenced
this pull request
Jun 10, 2026
…ntext, fix HiveContext bypass, reorganize tests - Remove public `getOrCreate` from Connect SQLContext; internal dispatch uses `_get_or_create_from_session` only (fixes Finding #1 / #4) - Fix HiveContext bypass in classic dispatch: route getOrCreate to the Connect counterpart by class name so ConnectHiveContext._from_session raises as expected (fixes Finding #2) - Fix newSession() docstring to accurately describe cloneSession() semantics (fixes Finding #3) - Fix docstring nits: missing article, list/tuple, inferring the schema, table names as strings, streams wording - Add comment explaining catalog.listTables() over SHOW TABLES - Reorganize tests: add test_sql_context.py with mixin + classic runner, test_parity_sql_context.py for Connect parity, slim test_connect_context.py to Connect-specific tests only Co-authored-by: DB Tsai <db.tsai@databricks.com>
dbtsai
added a commit
that referenced
this pull request
Jun 11, 2026
…ntext, fix HiveContext bypass, reorganize tests - Remove public `getOrCreate` from Connect SQLContext; internal dispatch uses `_get_or_create_from_session` only (fixes Finding #1 / #4) - Fix HiveContext bypass in classic dispatch: route getOrCreate to the Connect counterpart by class name so ConnectHiveContext._from_session raises as expected (fixes Finding #2) - Fix newSession() docstring to accurately describe cloneSession() semantics (fixes Finding #3) - Fix docstring nits: missing article, list/tuple, inferring the schema, table names as strings, streams wording - Add comment explaining catalog.listTables() over SHOW TABLES - Reorganize tests: add test_sql_context.py with mixin + classic runner, test_parity_sql_context.py for Connect parity, slim test_connect_context.py to Connect-specific tests only Co-authored-by: DB Tsai <db.tsai@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LabeledPointis not used inpyspark.ml. This PR removes it and its pickler as well.