remove ml.LabeledPoint from PySpark and annotate ml.LabeledPoint by mengxr · Pull Request #2 · dbtsai/spark

mengxr · 2016-05-17T17:45:46Z

LabeledPoint is not used in pyspark.ml. This PR removes it and its pickler as well.

…ython

## What changes were proposed in this pull request? There were two related fixes regarding `from_json`, `get_json_object` and `json_tuple` ([Fix #1](apache@c8803c0), [Fix #2](apache@86174ea)), but they weren't comprehensive it seems. I wanted to extend those fixes to all the parsers, and add tests for each case. ## How was this patch tested? Regression tests Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#20302 from brkyvz/json-invfix.

### What changes were proposed in this pull request? `org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite` failed lately. After had a look at the logs it just shows the following fact without any details: ``` Caused by: sbt.ForkMain$ForkError: sun.security.krb5.KrbException: Server not found in Kerberos database (7) - Server not found in Kerberos database ``` Since the issue is intermittent and not able to reproduce it we should add more debug information and wait for reproduction with the extended logs. ### Why are the changes needed? Failing test doesn't give enough debug information. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? I've started the test manually and checked that such additional debug messages show up: ``` >>> KrbApReq: APOptions are 00000000 00000000 00000000 00000000 >>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType Looking for keys for: kafka/localhostEXAMPLE.COM Added key: 17version: 0 Added key: 23version: 0 Added key: 16version: 0 Found unsupported keytype (3) for kafka/localhostEXAMPLE.COM >>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType Using builtin default etypes for permitted_enctypes default etypes for permitted_enctypes: 17 16 23. >>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType MemoryCache: add 1571936500/174770/16C565221B70AAB2BEFE31A83D13A2F4/client/localhostEXAMPLE.COM to client/localhostEXAMPLE.COM|kafka/localhostEXAMPLE.COM MemoryCache: Existing AuthList: #3: 1571936493/200803/8CD70D280B0862C5DA1FF901ECAD39FE/client/localhostEXAMPLE.COM #2: 1571936499/985009/BAD33290D079DD4E3579A8686EC326B7/client/localhostEXAMPLE.COM #1: 1571936499/995208/B76B9D78A9BE283AC78340157107FD40/client/localhostEXAMPLE.COM ``` Closes apache#26252 from gaborgsomogyi/SPARK-29580. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### What changes were proposed in this pull request? fix the error caused by interval output in ExtractBenchmark ### Why are the changes needed? fix a bug in the test ```scala [info] Running case: cast to interval [error] Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot use interval type in the table schema.;; [error] OverwriteByExpression RelationV2[] noop-table, true, true [error] +- Project [(subtractdates(cast(cast(id#0L as timestamp) as date), -719162) + subtracttimestamps(cast(id#0L as timestamp), -30610249419876544)) AS ((CAST(CAST(id AS TIMESTAMP) AS DATE) - DATE '0001-01-01') + (CAST(id AS TIMESTAMP) - TIMESTAMP '1000-01-01 01:02:03.123456'))#2] [error] +- Range (1262304000, 1272304000, step=1, splits=Some(1)) [error] [error] at org.apache.spark.sql.catalyst.util.TypeUtils$.failWithIntervalType(TypeUtils.scala:106) [error] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$25(CheckAnalysis.scala:389) [error] at org.a ``` ### Does this PR introduce any user-facing change? no ### How was this patch tested? re-run benchmark Closes apache#27867 from yaooqinn/SPARK-31111. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…chmarks ### What changes were proposed in this pull request? Replace `CAST(... AS TIMESTAMP` by `TIMESTAMP_SECONDS` in the following benchmarks: - ExtractBenchmark - DateTimeBenchmark - FilterPushdownBenchmark - InExpressionBenchmark ### Why are the changes needed? The benchmarks fail w/o the changes: ``` [info] Running benchmark: datetime +/- interval [info] Running case: date + interval(m) [error] Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(`id` AS TIMESTAMP)' due to data type mismatch: cannot cast bigint to timestamp,you can enable the casting by setting spark.sql.legacy.allowCastNumericToTimestamp to true,but we strongly recommend using function TIMESTAMP_SECONDS/TIMESTAMP_MILLIS/TIMESTAMP_MICROS instead.; line 1 pos 5; [error] 'Project [(cast(cast(id#0L as timestamp) as date) + 1 months) AS (CAST(CAST(id AS TIMESTAMP) AS DATE) + INTERVAL '1 months')#2] [error] +- Range (0, 10000000, step=1, splits=Some(1)) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the affected benchmarks. Closes apache#28843 from MaxGekk/GuoPhilipse-31710-fix-compatibility-followup. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…mands ### What changes were proposed in this pull request? Follow-up to [SPARK-52729](https://issues.apache.org/jira/browse/SPARK-52729) (which added `MetadataOnlyTable` and CREATE / ALTER VIEW … AS support for DS v2 catalogs). This PR closes out the *Remaining work* section of that PR's description, plus a few API/test cleanups. The branch is structured as **three commits**: **1. `[SPARK-56655][SQL] Rename v2 metadata-table API: MetadataOnlyTable, RelationCatalog, loadRelation`** — mechanical rename of the v2 view-API surface introduced by the parent PR: | Before | After | |---------------------|--------------------| | `MetadataOnlyTable` | `MetadataTable` | | `RelationCatalog` | `TableViewCatalog` | | `loadRelation()` | `loadTableOrView()` | | `DataSourceV2MetadataOnly{Table,View}Suite` | `DataSourceV2Metadata{Table,View}Suite` | The unrelated v2 helpers `CatalogV2Util.loadRelation` and `RelationResolution`'s private `loadRelation(V2TableReference)` predate the parent PR and are intentionally not renamed. **2. `[SPARK-56655][SQL] Implement remaining v2 view DDL and inspection commands`** — the production-side work: *New view DDL execs* (`AlterV2ViewExec.scala`): - `AlterV2ViewSetPropertiesExec`, `AlterV2ViewUnsetPropertiesExec` — merge / drop user TBLPROPERTIES on the existing view; dispatch to `ViewCatalog#replaceView`. - `AlterV2ViewSchemaBindingExec` — rewrites the schema-binding mode field; dispatch to `replaceView`. - `RenameV2ViewExec` — dispatches to a new abstract `ViewCatalog#renameView(Identifier, Identifier)` (added to `ViewCatalog`, contracted on `TableViewCatalog`). - A shared `V2ViewMetadataMutation.builderFrom(existing)` helper seeds a `ViewInfo.Builder` from the existing view so mutating execs override only the field they're changing. *New view inspection execs* (`V2ViewInspectionExecs.scala`): - `ShowCreateV2ViewExec` — reconstructs the `CREATE VIEW …` DDL from `ViewInfo`. - `ShowV2ViewPropertiesExec`, `ShowV2ViewColumnsExec`, `DescribeV2ViewExec`, `DescribeV2ViewColumnExec` — produce output rows directly from `ViewInfo`. `DESCRIBE TABLE EXTENDED` emits a v2-native `# Detailed View Information` block. *v1-parity for `SHOW TABLES` on a `TableViewCatalog`* (`ShowTablesExec.scala`): when the catalog is a `TableViewCatalog`, route through `listRelationSummaries` so views appear alongside tables. Pure `TableCatalog` catalogs continue to use `listTables` and return tables only. *All `UNSUPPORTED_FEATURE.TABLE_OPERATION` pins* from the parent PR for these commands are replaced with real strategy cases. *Architecture — single typed payload for resolved views*. To match the v2-table convention (where `ResolvedTable.table: Table` carries the resolved Table — and v1 tables appear as `V1Table` wrapping a `CatalogTable`), `ResolvedPersistentView` now carries `info: ViewInfo`: ```scala case class ResolvedPersistentView( catalog: CatalogPlugin, identifier: Identifier, info: ViewInfo) // v2 ViewInfo for v2 views; V1ViewInfo wrapping // CatalogTable for session-catalog (v1) views ``` A new `private[sql] V1ViewInfo(v1Table: CatalogTable) extends ViewInfo` exposes a v1 `CatalogTable` through the v2 `ViewInfo` surface, mirroring `V1Table` for `Table`. `ViewInfo`'s constructor relaxed from `private` to `protected` so the subclass can call it. v1-only paths (DescribeTableCommand via `ResolvedChildHelper`, the v1 DescribeRelation JSON path, ApplyDefaultCollation's AlterViewAs rewrite, CreateTableLike strategy case) recover the original `CatalogTable` via `info.asInstanceOf[V1ViewInfo].v1Table`. v2 strategy cases consume `rpv.info` directly. `ResolvedPersistentView.output` now exposes the view's schema attributes (with char/varchar normalization), so `DescribeColumn` against a v2 view survives `CheckAnalysis` — the column resolves naturally through `ResolveReferences`, and the strategy / v1 rewrite both extract `nameParts` from the resolved attribute. No new logical plan needed for DESCRIBE COLUMN; the existing one flows to the planner intact. Also folded into this commit: - `ALTER TABLE <view> RENAME TO …` is rejected with `EXPECT_TABLE_NOT_VIEW.USE_ALTER_VIEW` on the v2 path (mirrors v1 `DDLUtils.verifyAlterTableType`). **3. `[SPARK-56655][SQL][TESTS] Add per-catalog view command test triplets and fold in late prod tweaks`** — test scaffolding: Mirror the DROP TABLE test layout from `sql/core/test/.../command/{,v1/,v2/}` for every v2 view DDL / inspection command. Each command lands as: - `command/<Cmd>SuiteBase.scala` — unified tests parameterized by `$catalog` - `command/v1/<Cmd>Suite.scala` — extends Base + v1 `ViewCommandSuiteBase` (pins `$catalog` to `spark_catalog`, so the unified tests target the session catalog) - `command/v2/<Cmd>Suite.scala` — extends Base + v2 `ViewCommandSuiteBase` (pins `$catalog` to a fresh `test_view_catalog` backed by a new general-purpose `InMemoryTableViewCatalog` test fixture), plus catalog-state assertions specific to the v2 fixture Triplets cover: CREATE VIEW, ALTER VIEW … AS, ALTER VIEW SET / UNSET TBLPROPERTIES, ALTER VIEW RENAME TO, ALTER VIEW WITH SCHEMA, SHOW CREATE TABLE, SHOW TBLPROPERTIES, SHOW COLUMNS, DESCRIBE TABLE, DESCRIBE TABLE … COLUMN, DROP VIEW, SHOW VIEWS. Each Base test runs against both `spark_catalog` (v1, hits the existing v1 commands) and `test_view_catalog` (v2, hits the new execs from commit #2), giving a single source of cross-catalog behavioral parity. The pre-existing `DataSourceV2MetadataViewSuite` is trimmed: CREATE / ALTER / DROP / SHOW VIEW DDL coverage moves to the per-catalog triplets. What remains in the leaf suite is genuinely v2-specific structural coverage (view read-path, V1Table.toCatalogTable round-trip, pure-ViewCatalog read + ALTER, multi-level-namespace cyclic detection / error rendering, REFRESH / ANALYZE rejection, SHOW TABLES on a `TableViewCatalog` returning both kinds). ### Why are the changes needed? The parent PR shipped CREATE VIEW + ALTER VIEW … AS through the v2 surface but pinned the rest of the view DDL/inspection family with `UNSUPPORTED_FEATURE.TABLE_OPERATION`. Third-party v2 catalogs that host views still couldn't run `ALTER VIEW SET TBLPROPERTIES`, `ALTER VIEW … RENAME TO`, `ALTER VIEW … WITH SCHEMA <mode>`, `DESCRIBE`, `SHOW CREATE TABLE`, `SHOW TBLPROPERTIES`, `SHOW COLUMNS` against their views — full v1 parity for non-session view catalogs requires this set. The rename to `MetadataTable` / `TableViewCatalog` / `loadTableOrView` was discussed during the parent PR's review as a clarity improvement; doing it now (before the API ships in a release) avoids deprecation churn later. `ResolvedPersistentView.info: ViewInfo` (with `V1ViewInfo` for v1 views) brings v2 view command flow in line with the existing v2 table command convention (`ResolvedTable.table: Table`, with `V1Table` for v1 tables) — single typed payload, resolved at analysis time, consumed at exec time. Looking up database objects at runtime is the anti-pattern this removes. ### Does this PR introduce _any_ user-facing change? Yes for **connector developers**: - The rename is a source-incompatible change on the still-`Evolving` v2 view API surface. Connectors implementing the parent PR's `RelationCatalog` / overriding `loadRelation` / referencing `MetadataOnlyTable` need to update to `TableViewCatalog` / `loadTableOrView` / `MetadataTable`. - `ViewCatalog` gains a new abstract `renameView(oldIdent, newIdent)` method. Existing `ViewCatalog` implementations need to add it (the parent PR has not yet released, so this is still pre-release breakage). - `ViewInfo`'s constructor is now `protected` (was `private`). Existing call sites use the `ViewInfo.Builder`; only internal subclasses need to call the constructor directly. Yes for **end users on a non-session v2 view catalog**: the listed DDL / inspection commands now succeed against a `ViewCatalog` instead of erroring with `UNSUPPORTED_FEATURE.TABLE_OPERATION`. `SHOW TABLES` on a `TableViewCatalog` now returns both tables and views, matching v1 SHOW TABLES output. `ALTER TABLE <view> RENAME TO …` (wrong syntax) now returns `EXPECT_TABLE_NOT_VIEW.USE_ALTER_VIEW` instead of silently succeeding. No user-facing change on the session catalog path — those plans are still rewritten to the v1 commands by `ResolveSessionCatalog` and behave exactly as before. ### How was this patch tested? - 13 new per-catalog `*SuiteBase` triplets under `sql/core/src/test/.../command/{,v1/,v2/}` exercise each command against both a v1 (session) catalog and a fresh v2 `InMemoryTableViewCatalog` fixture. - The pre-existing `DataSourceV2MetadataViewSuite` is trimmed to v2-specific structural tests (read path, `V1Table.toCatalogTable` round-trip, pure-ViewCatalog read+ALTER, multi-level-namespace cyclic / error rendering, REFRESH/ANALYZE rejection, SHOW TABLES TableViewCatalog). - 242 view-related tests pass locally across 30 suites, plus 54/54 SimpleSQLViewSuite, plus 171/171 table-side inspection suites (verifying no regression in `ResolvedChildHelper.getTableMetadata` after the metadata→info refactor). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude (Anthropic) Closes apache#55593 from cloud-fan/v2-view-followup. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>

… metadata ### What changes were proposed in this pull request? Follow-up to apache#55636 addressing post-merge review comments from zikangh: 1. **Deduplicate `isCarryoverPair`.** The carry-over predicate (`_del_cnt = 1 AND _ins_cnt = 1 AND _rv_cnt = 2 AND _min_rv = _max_rv`) was duplicated between the batch path's `addCarryOverPairFilter` and the streaming path's inline filter. Extracted a shared `buildCarryOverPairPredicate` helper and call it from both. 2. **Mark the streaming row-level rewrite via attribute metadata rather than helper column name.** `UnsupportedOperationChecker` previously detected the rewrite by string-matching the `__spark_cdc_events` aggregate alias name. Switched to a metadata marker (`ResolveChangelogTable.streamingPostProcessingMarker`) attached to the alias's output attribute -- mirroring the existing `EventTimeWatermark.delayKey` and `SessionWindow.marker` patterns. The marker travels with the attribute through optimization. 3. **Expand streaming E2E coverage.** New tests in `ChangelogEndToEndSuite`: - composite rowId carry-over removal, - composite rowId update detection (different tuples kept raw), - carry-over + update detection across multiple commits, - DELETE-all-rows and UPDATE-all-rows fixtures, - append-only workload pass-through, - no-op UPDATE labeled as update (rcv differs on pre/post), - large carry-over removal (9 carry-over pairs + 1 real delete). ### Why are the changes needed? zikangh raised these on the merged PR. Bundled together so they can be reviewed and shipped as one follow-up. ### Does this PR introduce _any_ user-facing change? No. Internal refactor (#1, #2) and additional test coverage (#3). The behavior of streaming CDC reads is unchanged. ### How was this patch tested? All 157 tests pass across the four CDC suites: - `ChangelogResolutionSuite` - `ResolveChangelogTablePostProcessingSuite` - `ResolveChangelogTableStreamingPostProcessingSuite` - `ChangelogEndToEndSuite` Also confirmed: - `UnsupportedOperationsSuite` (216 tests) still passes after the marker-based detection switch. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (claude-opus-4-7) Closes apache#55653 from gengliangwang/streamingCDC-followup-zikangh. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>

### What changes were proposed in this pull request? Address the open follow-ups from [SPARK-56681](https://issues.apache.org/jira/browse/SPARK-56681) (umbrella for PATH / SPARK-56605 cleanup) in a single cleanup PR. Items #1 and #2 were already wired by SPARK-56639; this PR covers the remainder. | # | Item | Resolution | |---|---|---| | #1 | `FunctionResolution.resolveProcedure` was dead code | Already wired by SPARK-56639 (no action). | | #2 | Frozen view / SQL-function PATH wiring unfinished | Already done by SPARK-56639 (no action). | | #3 | `AnalysisContext.resolutionPathEntries` threadlocal | Audit only: confirmed `withNewAnalysisContext` / `reset()` correctly clear it. Full removal needs a coordinated refactor to plumb the path through `RelationResolution` / `FunctionResolution` method calls; flagged as a follow-up. | | #4 | `Analyzer.executeAndCheck` clobbers outer `SQLConf.withExistingConf` | Extracted `runWithSessionConf` helper, added `SQLConf.getExistingConfIfSet`. `executeAndCheck` and `executeSameContext` now share one path that yields to any outer scope. | | #5 | `VariableResolution.allowUnqualifiedSessionTempVariableLookup` force-loads default catalog | Replaced the hot-path catalog read with `CatalogManager.isSystemSessionOnPath`, which inspects stored session-path entries directly. No catalog load on column resolution. | | #6 | `DROP VARIABLE` PATH gate asymmetric with `DECLARE` / `CREATE` | Removed the gate. DDL on session variables (`DECLARE` / `CREATE` / `DROP`) always targets `system.session` directly; only DML (`SET VAR`, `SELECT x`) goes through PATH. | | #7 | `lookupFunctionType` exception swallow too broad | Narrowed from `NonFatal` to the explicit not-found list (`NoSuchFunctionException`, `NoSuchNamespaceException`, `CatalogNotFoundException`, `FORBIDDEN_OPERATION`). Other exceptions propagate. | | #8 | `lookupFunctionType` fan-out had wasteful `system.*` candidates | Filtered them out — `system.session`, `system.builtin`, `system.ai` are already resolved earlier in the same method. | | #9 | Three near-duplicate path-resolution helpers | Lifted into `CatalogManager.resolutionPathEntriesForAnalysis(pinnedEntries, viewCatalogAndNamespace)`. Relation, routine, and procedure resolution all route through it. | | #10 | Tests for the new error paths and gates | Added a DECLARE / SET VAR / DROP cycle test under non-default PATH and a struct-variable field-vs-qualified ambiguity test in `sql-session-variables.sql`. | | #11 | `ProtoToParsedPlanTestSuite.analyzerIsolationConf` was a bare `SQLConf` | Clone `spark.sessionState.conf` and only override `PATH_ENABLED=false`, so all `sparkConf` overrides (ANSI, alias config, ...) propagate automatically. | | Bonus | `ResolveSetVariable` hardcoded `SYSTEM.SESSION` regardless of actual PATH | `unresolvedVariableError` now takes `Seq[Seq[String]]` path entries with **required** `Origin` (no overloads). DML lookup failures (`SET VAR`, `FETCH ... INTO`) report the full SQL path as a bracketed list, byte-for-byte consistent with `UNRESOLVED_ROUTINE` and `TABLE_OR_VIEW_NOT_FOUND`. DDL name validation in `ResolveCatalogs` continues to report `[system.session]` since PATH does not apply there. Origin is plumbed through `VariableManager.set` so all error sites carry a `queryContext` pointing at the offending variable identifier (parser opt-ins via `withOrigin(identifierReference)` so the highlight is the variable name, not the whole statement). | ### Why are the changes needed? These are the cleanup items called out on SPARK-56681 from the post-merge source review of SPARK-56605. They eliminate dead code paths, plug user-visible bugs (force-loading a misconfigured default catalog on column resolution; clobbering pinned session configs; swallowing real catalog errors as `UNRESOLVED_ROUTINE`), remove the asymmetry between DDL and DML on session variables, and make `UNRESOLVED_VARIABLE` self-consistent with the other "not found" errors. ### Does this PR introduce _any_ user-facing change? Yes. - **`UNRESOLVED_VARIABLE.searchPath`** is now rendered as a bracketed list. For DML lookups (`SET VAR`, `FETCH ... INTO`), the list reflects the actual SQL PATH that was consulted instead of a hardcoded `SYSTEM.SESSION`. For DDL name validation (`DECLARE` / `DROP` with a non-session namespace), the list is `[`` `system`.`session` ``]` since PATH does not apply. - **`UNRESOLVED_VARIABLE`** now always carries a `queryContext` that highlights just the offending variable identifier (e.g. `"builtin.var1"`, `"ses.var1"`), not the whole `DECLARE` / `SET VAR` statement. - **`DROP TEMPORARY VARIABLE`** no longer raises `UNRESOLVED_VARIABLE` when the SQL PATH does not contain `system.session`. DDL on session variables ignores PATH, matching the existing behaviour of `DECLARE OR REPLACE VARIABLE`. - **`lookupFunctionType`** no longer swallows non–`NotFound` errors. A catalog reporting `PERMISSION_DENIED` (or similar) for a function lookup now propagates instead of silently producing `UNRESOLVED_ROUTINE`. ### How was this patch tested? - Added `sql-session-variables.sql` regression test for the struct-variable field-vs-qualified ambiguity (`DECLARE VARIABLE session STRUCT<a INT>` → `SELECT session.a` succeeds → `DROP` → `SELECT session.a` falls through to `UNRESOLVED_COLUMN`). - Updated `SetPathSuite`: DECLARE / SET VAR / DROP cycle under a non-default PATH; bonus test asserts the actual rendered search path and the variable-identifier `queryContext`. - Updated `SqlScriptingExecutionSuite` for the new bracketed `searchPath` and identifier-pinned `queryContext`. - Regenerated `sql-session-variables.sql.out` for the new error shape. - Added `resolutionPathEntriesForAnalysis` stubs to mocked `CatalogManager` instances in `PlanResolutionSuite`, `AlignAssignmentsSuiteBase`, and `TableLookupCacheSuite`. - Ran focused suites locally; all pass: - `build/sbt 'sql/testOnly *SetPathSuite *SqlScriptingExecutionSuite *ExecuteImmediateEndToEndSuite'` - `build/sbt 'sql/testOnly *SimpleSQLViewSuite *SQLFunctionSuite'` - `build/sbt 'sql/testOnly *PlanResolutionSuite *UpdateTableAlignAssignmentsSuite *MergeIntoTableAlignAssignmentsSuite'` - `build/sbt 'catalyst/testOnly *TableLookupCacheSuite *AnalysisSuite *AnalysisErrorSuite *LookupFunctionsSuite'` - `build/sbt 'sql/testOnly *FunctionQualificationSuite *RelationQualificationSuite *DataSourceV2FunctionSuite'` - `build/sbt 'sql/testOnly *SQLQuerySuite'` - `build/sbt 'connect/testOnly *ProtoToParsedPlanTestSuite'` - `build/sbt 'sql/testOnly *SQLQueryTestSuite -- -z sql-session-variables.sql'` - Full `org.apache.spark.sql.catalyst.analysis.*`, `org.apache.spark.sql.catalyst.parser.*`, and `org.apache.spark.sql.analysis.resolver.*` suites. - `scalastyle` and `scalafmt` clean across catalyst, sql, and connect modules. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor Claude Opus 4.7 Closes apache#55647 from srielau/SPARK-56681-patch-clean-up. Authored-by: Serge Rielau <serge@rielau.com> Signed-off-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>

…ntext, fix HiveContext bypass, reorganize tests - Remove public `getOrCreate` from Connect SQLContext; internal dispatch uses `_get_or_create_from_session` only (fixes Finding #1 / #4) - Fix HiveContext bypass in classic dispatch: route getOrCreate to the Connect counterpart by class name so ConnectHiveContext._from_session raises as expected (fixes Finding #2) - Fix newSession() docstring to accurately describe cloneSession() semantics (fixes Finding #3) - Fix docstring nits: missing article, list/tuple, inferring the schema, table names as strings, streams wording - Add comment explaining catalog.listTables() over SHOW TABLES - Reorganize tests: add test_sql_context.py with mixin + classic runner, test_parity_sql_context.py for Connect parity, slim test_connect_context.py to Connect-specific tests only Co-authored-by: DB Tsai <db.tsai@databricks.com>

remove ml.LabeledPoint from PySpark and annotate ml.LabeledPoint in P…

f385367

…ython

mengxr merged commit 953eea7 into dbtsai:SPARK-14615-NewML May 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove ml.LabeledPoint from PySpark and annotate ml.LabeledPoint#2

remove ml.LabeledPoint from PySpark and annotate ml.LabeledPoint#2
mengxr merged 1 commit into
dbtsai:SPARK-14615-NewMLfrom
mengxr:SPARK-14615

mengxr commented May 17, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mengxr commented May 17, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant