Support metric view creation and querying with V2 catalogs (OSS UC)#1
Open
chenwang-databricks wants to merge 8 commits into
Open
Support metric view creation and querying with V2 catalogs (OSS UC)#1chenwang-databricks wants to merge 8 commits into
chenwang-databricks wants to merge 8 commits into
Conversation
…alog) - Route metric view creation to V2 catalog when the target is not the session catalog, extracting source table dependency from YAML and passing it as a table property for the connector to forward to UC - Use parseMultipartIdentifier instead of parseTableIdentifier in Source.apply to support three-part names (catalog.schema.table) in metric view YAML source references - Replace misleading invalidLiteralForWindowDurationError with accurate SparkException.internalError messages for YAML parsing failures
… metric view ID propagation - Add CatalogTableType.METRIC_VIEW and METRIC_VIEW_TABLE_TYPE constant - Extend TableInfo with tableType, viewDefinition, and viewDependencies fields - Add Dependency, DependencyList, TableDependency, FunctionDependency classes - Add metricViewId field to AnalysisContext for definer's rights credential vending - Update V1Table to map METRIC_VIEW to the V2 table type - Update isMetricView() for backward compatibility with legacy property - Use structured TableInfo in CreateMetricViewCommand instead of flat properties
Open
4 tasks
The metricViewId field and setMetricViewId method were only needed for definer's rights credential vending. With invoker's rights, the connector no longer propagates metric view identity through the AnalysisContext. Revert to upstream state.
DelegatingCatalogExtension forwards the older createTable overloads (StructType and Column[] variants) but was missing the TableInfo overload added in 4.1.0. Without this, catalog extensions like DeltaCatalog silently drop tableType, viewDefinition, and viewDependencies by falling through to the default implementation on TableCatalog which only passes columns/partitions/properties.
chenwang-databricks
left a comment
Owner
Author
There was a problem hiding this comment.
Is it possible to add unit tests to test the integration of v2 catalog for metric views?
Review comment fixes:
- Drop the "This mirrors the Databricks Unity Catalog..." sentence
from Dependency.java and DependencyList.java javadocs; these types
are Spark V2 catalog APIs and the description should be generic.
- Simplify CatalogTable.isMetricView to just check tableType ==
METRIC_VIEW, and remove the unused VIEW_WITH_METRICS constant.
This PR stopped writing the legacy property, so the fallback is
dead code.
- In CreateMetricViewCommand.createMetricViewInV2Catalog, drop the
extractSourceTables helper that branched on AssetSource/SQLSource.
The analyzed plan has already resolved both source types, so
MetricViewHelper.collectTableDependencies(analyzed) alone produces
fully-qualified 3-part names for the dependency list.
- On the V2 path, additionally capture:
- metric_view.from.type / metric_view.from.name /
metric_view.from.sql / metric_view.where via a new
MetricView.getProperties method (mirrors the canonical metric
view properties used by other platforms, e.g. Databricks).
- view.catalogAndNamespace.*, view.sqlConfig.*, view.schemaMode,
and query-column-name props via ViewHelper.generateViewProperties
so the V2 metric view can be reloaded and re-parsed with the
same catalog/namespace and SQL configs as the V1 path.
- Use TableCatalog.PROP_COMMENT instead of the raw "comment"
string for consistency.
New tests:
- MetricViewV2CatalogSuite exercises createMetricViewInV2Catalog
via a non-session catalog (MetricViewRecordingCatalog built on
InMemoryTableCatalog that captures the full TableInfo passed to
createTable). Covers asset source, SQL source, and the new
metric_view.* / view.* properties plus comment round-trip.
Aligns DROP semantics for metric views with how Databricks treats them: metric views are conceptually views, so DROP VIEW must succeed and DROP TABLE must fail with WRONG_COMMAND_FOR_OBJECT_TYPE pointing the caller at DROP VIEW. V1 (session catalog) path: - DropTableCommand: allow DROP VIEW on tables whose tableType is METRIC_VIEW (in addition to VIEW), and reject DROP TABLE on either VIEW or METRIC_VIEW. The required-type message now reads "VIEW or METRIC_VIEW" so the rejection is self-explanatory. V2 path (non-session catalogs): - V2SessionCatalog.dropTableInternal: also reject when the loaded V1 table is a METRIC_VIEW, mirroring the existing VIEW rejection. - DropTableExec: load the table and throw WRONG_COMMAND_FOR_OBJECT_TYPE when its table_type property is METRIC_VIEW; this catches DROP TABLE catalog.ns.metric_view on arbitrary V2 catalogs. - New DropMetricViewExec handles DROP VIEW catalog.ns.metric_view on non-session V2 catalogs by calling catalog.dropTable when the target is a metric view, and throwing WRONG_COMMAND_FOR_OBJECT_TYPE (DROP TABLE) when the target is a regular table. - ResolveSessionCatalog: stop throwing "catalog doesn't support views" for V2 DropView; leave the plan in place so DataSourceV2Strategy can pick it up and route to DropMetricViewExec. Tests: - MetricViewV2CatalogSuite gets three new cases covering DROP TABLE rejection on a metric view, DROP VIEW success on a metric view, DROP VIEW IF EXISTS no-op on a missing metric view, and DROP VIEW rejection on a regular V2 table. The recording catalog also propagates table_type into the stored table's properties so the routing/checks can find it. - The existing V1 metric-view tests (SimpleMetricViewSuite) now pass cleanly since `withView(...)`'s DROP VIEW cleanup works on metric views. - DDLSuite "drop view" and HiveDDLSuite "drop table using drop view" updated to expect "VIEW or METRIC_VIEW" in the requiredType field.
| case V1Table(v1Table) | ||
| if (v1Table.tableType == CatalogTableType.VIEW && | ||
| !SQLConf.get.getConf(SQLConf.DROP_TABLE_VIEW_ENABLED)) || | ||
| v1Table.tableType == CatalogTableType.METRIC_VIEW => |
Owner
Author
There was a problem hiding this comment.
move this to be alongside the VIEW type check so that SQLConf.DROP_TABLE_VIEW_ENABLED can control the same behavior
Comment on lines
+38
to
+52
| val table = catalog.loadTable(ident) | ||
| val tableType = Option(table.properties().get(TableCatalog.PROP_TABLE_TYPE)).orNull | ||
| // Metric views are stored as tables in V2 catalogs but must be dropped via DROP VIEW. | ||
| if (TableSummary.METRIC_VIEW_TABLE_TYPE.equals(tableType)) { | ||
| val qualified = | ||
| (catalog.name() +: ident.namespace() :+ ident.name()).mkString(".") | ||
| throw QueryCompilationErrors.wrongCommandForObjectTypeError( | ||
| operation = "DROP TABLE", | ||
| requiredType = | ||
| s"${TableSummary.EXTERNAL_TABLE_TYPE} or ${TableSummary.MANAGED_TABLE_TYPE}", | ||
| objectName = qualified, | ||
| foundType = TableSummary.METRIC_VIEW_TABLE_TYPE, | ||
| alternative = "DROP VIEW" | ||
| ) | ||
| } |
Owner
Author
There was a problem hiding this comment.
Do you really have to check this? It doesn't check for view type as well. I mean, as long as we can route the command based on table type correctly, you don't have to check it here, right?
| val invalidateFunc = () => CommandUtils.uncacheTableOrView(session, r) | ||
| DropTableExec(r.catalog.asTableCatalog, r.identifier, ifExists, purge, invalidateFunc) :: Nil | ||
|
|
||
| case DropView(r: ResolvedIdentifier, ifExists) => |
Owner
Author
There was a problem hiding this comment.
Can you route based on the table type? Basically, we only route metric view type to DropMetricViewExec.
Addresses review feedback on the previous commit:
- V2SessionCatalog.dropTableInternal: place METRIC_VIEW alongside VIEW
inside the same `SQLConf.DROP_TABLE_VIEW_ENABLED` predicate so the
conf controls both (a user opting in to "drop view via DROP TABLE"
for plain views also gets it for metric views).
- DropTableExec: remove the metric-view rejection. The exec stays
type-agnostic; routing/validation belongs to the strategy.
- DropMetricViewExec: drop the type check too. The strategy guarantees
this exec is only chosen for actual metric views (or for IF EXISTS
no-ops on missing identifiers).
- DataSourceV2Strategy: load the resolved table once and route based
on its `table_type` property:
* DROP TABLE on a metric view -> WRONG_COMMAND_FOR_OBJECT_TYPE
pointing at DROP VIEW.
* DROP VIEW on a metric view -> DropMetricViewExec.
* DROP VIEW on a non-existent identifier -> DropMetricViewExec
(so IF EXISTS is honored uniformly).
* DROP VIEW on a regular V2 table -> WRONG_COMMAND_FOR_OBJECT_TYPE
pointing at DROP TABLE.
The existing MetricViewV2CatalogSuite cases still cover all of these
branches and pass unchanged; the V1 SimpleMetricViewSuite and the
InMemoryCatalogedDDLSuite continue to pass.
chenwang-databricks
added a commit
that referenced
this pull request
May 1, 2026
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks
added a commit
that referenced
this pull request
May 1, 2026
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks
added a commit
that referenced
this pull request
May 1, 2026
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks
added a commit
that referenced
this pull request
May 1, 2026
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks
added a commit
that referenced
this pull request
May 1, 2026
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks
pushed a commit
that referenced
this pull request
May 7, 2026
… metadata ### What changes were proposed in this pull request? Follow-up to apache#55636 addressing post-merge review comments from zikangh: 1. **Deduplicate `isCarryoverPair`.** The carry-over predicate (`_del_cnt = 1 AND _ins_cnt = 1 AND _rv_cnt = 2 AND _min_rv = _max_rv`) was duplicated between the batch path's `addCarryOverPairFilter` and the streaming path's inline filter. Extracted a shared `buildCarryOverPairPredicate` helper and call it from both. 2. **Mark the streaming row-level rewrite via attribute metadata rather than helper column name.** `UnsupportedOperationChecker` previously detected the rewrite by string-matching the `__spark_cdc_events` aggregate alias name. Switched to a metadata marker (`ResolveChangelogTable.streamingPostProcessingMarker`) attached to the alias's output attribute -- mirroring the existing `EventTimeWatermark.delayKey` and `SessionWindow.marker` patterns. The marker travels with the attribute through optimization. 3. **Expand streaming E2E coverage.** New tests in `ChangelogEndToEndSuite`: - composite rowId carry-over removal, - composite rowId update detection (different tuples kept raw), - carry-over + update detection across multiple commits, - DELETE-all-rows and UPDATE-all-rows fixtures, - append-only workload pass-through, - no-op UPDATE labeled as update (rcv differs on pre/post), - large carry-over removal (9 carry-over pairs + 1 real delete). ### Why are the changes needed? zikangh raised these on the merged PR. Bundled together so they can be reviewed and shipped as one follow-up. ### Does this PR introduce _any_ user-facing change? No. Internal refactor (#1, apache#2) and additional test coverage (apache#3). The behavior of streaming CDC reads is unchanged. ### How was this patch tested? All 157 tests pass across the four CDC suites: - `ChangelogResolutionSuite` - `ResolveChangelogTablePostProcessingSuite` - `ResolveChangelogTableStreamingPostProcessingSuite` - `ChangelogEndToEndSuite` Also confirmed: - `UnsupportedOperationsSuite` (216 tests) still passes after the marker-based detection switch. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (claude-opus-4-7) Closes apache#55653 from gengliangwang/streamingCDC-followup-zikangh. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
chenwang-databricks
pushed a commit
that referenced
this pull request
May 7, 2026
### What changes were proposed in this pull request? Address the open follow-ups from [SPARK-56681](https://issues.apache.org/jira/browse/SPARK-56681) (umbrella for PATH / SPARK-56605 cleanup) in a single cleanup PR. Items #1 and apache#2 were already wired by SPARK-56639; this PR covers the remainder. | # | Item | Resolution | |---|---|---| | #1 | `FunctionResolution.resolveProcedure` was dead code | Already wired by SPARK-56639 (no action). | | apache#2 | Frozen view / SQL-function PATH wiring unfinished | Already done by SPARK-56639 (no action). | | apache#3 | `AnalysisContext.resolutionPathEntries` threadlocal | Audit only: confirmed `withNewAnalysisContext` / `reset()` correctly clear it. Full removal needs a coordinated refactor to plumb the path through `RelationResolution` / `FunctionResolution` method calls; flagged as a follow-up. | | apache#4 | `Analyzer.executeAndCheck` clobbers outer `SQLConf.withExistingConf` | Extracted `runWithSessionConf` helper, added `SQLConf.getExistingConfIfSet`. `executeAndCheck` and `executeSameContext` now share one path that yields to any outer scope. | | apache#5 | `VariableResolution.allowUnqualifiedSessionTempVariableLookup` force-loads default catalog | Replaced the hot-path catalog read with `CatalogManager.isSystemSessionOnPath`, which inspects stored session-path entries directly. No catalog load on column resolution. | | apache#6 | `DROP VARIABLE` PATH gate asymmetric with `DECLARE` / `CREATE` | Removed the gate. DDL on session variables (`DECLARE` / `CREATE` / `DROP`) always targets `system.session` directly; only DML (`SET VAR`, `SELECT x`) goes through PATH. | | apache#7 | `lookupFunctionType` exception swallow too broad | Narrowed from `NonFatal` to the explicit not-found list (`NoSuchFunctionException`, `NoSuchNamespaceException`, `CatalogNotFoundException`, `FORBIDDEN_OPERATION`). Other exceptions propagate. | | apache#8 | `lookupFunctionType` fan-out had wasteful `system.*` candidates | Filtered them out — `system.session`, `system.builtin`, `system.ai` are already resolved earlier in the same method. | | apache#9 | Three near-duplicate path-resolution helpers | Lifted into `CatalogManager.resolutionPathEntriesForAnalysis(pinnedEntries, viewCatalogAndNamespace)`. Relation, routine, and procedure resolution all route through it. | | apache#10 | Tests for the new error paths and gates | Added a DECLARE / SET VAR / DROP cycle test under non-default PATH and a struct-variable field-vs-qualified ambiguity test in `sql-session-variables.sql`. | | apache#11 | `ProtoToParsedPlanTestSuite.analyzerIsolationConf` was a bare `SQLConf` | Clone `spark.sessionState.conf` and only override `PATH_ENABLED=false`, so all `sparkConf` overrides (ANSI, alias config, ...) propagate automatically. | | Bonus | `ResolveSetVariable` hardcoded `SYSTEM.SESSION` regardless of actual PATH | `unresolvedVariableError` now takes `Seq[Seq[String]]` path entries with **required** `Origin` (no overloads). DML lookup failures (`SET VAR`, `FETCH ... INTO`) report the full SQL path as a bracketed list, byte-for-byte consistent with `UNRESOLVED_ROUTINE` and `TABLE_OR_VIEW_NOT_FOUND`. DDL name validation in `ResolveCatalogs` continues to report `[system.session]` since PATH does not apply there. Origin is plumbed through `VariableManager.set` so all error sites carry a `queryContext` pointing at the offending variable identifier (parser opt-ins via `withOrigin(identifierReference)` so the highlight is the variable name, not the whole statement). | ### Why are the changes needed? These are the cleanup items called out on SPARK-56681 from the post-merge source review of SPARK-56605. They eliminate dead code paths, plug user-visible bugs (force-loading a misconfigured default catalog on column resolution; clobbering pinned session configs; swallowing real catalog errors as `UNRESOLVED_ROUTINE`), remove the asymmetry between DDL and DML on session variables, and make `UNRESOLVED_VARIABLE` self-consistent with the other "not found" errors. ### Does this PR introduce _any_ user-facing change? Yes. - **`UNRESOLVED_VARIABLE.searchPath`** is now rendered as a bracketed list. For DML lookups (`SET VAR`, `FETCH ... INTO`), the list reflects the actual SQL PATH that was consulted instead of a hardcoded `SYSTEM.SESSION`. For DDL name validation (`DECLARE` / `DROP` with a non-session namespace), the list is `[`` `system`.`session` ``]` since PATH does not apply. - **`UNRESOLVED_VARIABLE`** now always carries a `queryContext` that highlights just the offending variable identifier (e.g. `"builtin.var1"`, `"ses.var1"`), not the whole `DECLARE` / `SET VAR` statement. - **`DROP TEMPORARY VARIABLE`** no longer raises `UNRESOLVED_VARIABLE` when the SQL PATH does not contain `system.session`. DDL on session variables ignores PATH, matching the existing behaviour of `DECLARE OR REPLACE VARIABLE`. - **`lookupFunctionType`** no longer swallows non–`NotFound` errors. A catalog reporting `PERMISSION_DENIED` (or similar) for a function lookup now propagates instead of silently producing `UNRESOLVED_ROUTINE`. ### How was this patch tested? - Added `sql-session-variables.sql` regression test for the struct-variable field-vs-qualified ambiguity (`DECLARE VARIABLE session STRUCT<a INT>` → `SELECT session.a` succeeds → `DROP` → `SELECT session.a` falls through to `UNRESOLVED_COLUMN`). - Updated `SetPathSuite`: DECLARE / SET VAR / DROP cycle under a non-default PATH; bonus test asserts the actual rendered search path and the variable-identifier `queryContext`. - Updated `SqlScriptingExecutionSuite` for the new bracketed `searchPath` and identifier-pinned `queryContext`. - Regenerated `sql-session-variables.sql.out` for the new error shape. - Added `resolutionPathEntriesForAnalysis` stubs to mocked `CatalogManager` instances in `PlanResolutionSuite`, `AlignAssignmentsSuiteBase`, and `TableLookupCacheSuite`. - Ran focused suites locally; all pass: - `build/sbt 'sql/testOnly *SetPathSuite *SqlScriptingExecutionSuite *ExecuteImmediateEndToEndSuite'` - `build/sbt 'sql/testOnly *SimpleSQLViewSuite *SQLFunctionSuite'` - `build/sbt 'sql/testOnly *PlanResolutionSuite *UpdateTableAlignAssignmentsSuite *MergeIntoTableAlignAssignmentsSuite'` - `build/sbt 'catalyst/testOnly *TableLookupCacheSuite *AnalysisSuite *AnalysisErrorSuite *LookupFunctionsSuite'` - `build/sbt 'sql/testOnly *FunctionQualificationSuite *RelationQualificationSuite *DataSourceV2FunctionSuite'` - `build/sbt 'sql/testOnly *SQLQuerySuite'` - `build/sbt 'connect/testOnly *ProtoToParsedPlanTestSuite'` - `build/sbt 'sql/testOnly *SQLQueryTestSuite -- -z sql-session-variables.sql'` - Full `org.apache.spark.sql.catalyst.analysis.*`, `org.apache.spark.sql.catalyst.parser.*`, and `org.apache.spark.sql.analysis.resolver.*` suites. - `scalastyle` and `scalafmt` clean across catalyst, sql, and connect modules. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor Claude Opus 4.7 Closes apache#55647 from srielau/SPARK-56681-patch-clean-up. Authored-by: Serge Rielau <serge@rielau.com> Signed-off-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?