Skip to content

Support metric view creation and querying with V2 catalogs (OSS UC)#1

Open
chenwang-databricks wants to merge 8 commits into
masterfrom
metric-view-uc-integration
Open

Support metric view creation and querying with V2 catalogs (OSS UC)#1
chenwang-databricks wants to merge 8 commits into
masterfrom
metric-view-uc-integration

Conversation

@chenwang-databricks

@chenwang-databricks chenwang-databricks commented Mar 20, 2026

Copy link
Copy Markdown
Owner
  • Route metric view creation to V2 catalog when the target is not the session catalog, extracting source table dependency from YAML and passing it as a table property for the connector to forward to UC
  • Use parseMultipartIdentifier instead of parseTableIdentifier in Source.apply to support three-part names (catalog.schema.table) in metric view YAML source references
  • Replace misleading invalidLiteralForWindowDurationError with accurate SparkException.internalError messages for YAML parsing failures

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

…alog)

- Route metric view creation to V2 catalog when the target is not the
  session catalog, extracting source table dependency from YAML and
  passing it as a table property for the connector to forward to UC
- Use parseMultipartIdentifier instead of parseTableIdentifier in
  Source.apply to support three-part names (catalog.schema.table) in
  metric view YAML source references
- Replace misleading invalidLiteralForWindowDurationError with accurate
  SparkException.internalError messages for YAML parsing failures
… metric view ID propagation

- Add CatalogTableType.METRIC_VIEW and METRIC_VIEW_TABLE_TYPE constant
- Extend TableInfo with tableType, viewDefinition, and viewDependencies fields
- Add Dependency, DependencyList, TableDependency, FunctionDependency classes
- Add metricViewId field to AnalysisContext for definer's rights credential vending
- Update V1Table to map METRIC_VIEW to the V2 table type
- Update isMetricView() for backward compatibility with legacy property
- Use structured TableInfo in CreateMetricViewCommand instead of flat properties
The metricViewId field and setMetricViewId method were only needed for
definer's rights credential vending. With invoker's rights, the
connector no longer propagates metric view identity through the
AnalysisContext. Revert to upstream state.
DelegatingCatalogExtension forwards the older createTable overloads
(StructType and Column[] variants) but was missing the TableInfo
overload added in 4.1.0. Without this, catalog extensions like
DeltaCatalog silently drop tableType, viewDefinition, and
viewDependencies by falling through to the default implementation
on TableCatalog which only passes columns/partitions/properties.

@chenwang-databricks chenwang-databricks left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add unit tests to test the integration of v2 catalog for metric views?

Comment thread sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Dependency.java Outdated
Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala Outdated
Review comment fixes:

- Drop the "This mirrors the Databricks Unity Catalog..." sentence
  from Dependency.java and DependencyList.java javadocs; these types
  are Spark V2 catalog APIs and the description should be generic.

- Simplify CatalogTable.isMetricView to just check tableType ==
  METRIC_VIEW, and remove the unused VIEW_WITH_METRICS constant.
  This PR stopped writing the legacy property, so the fallback is
  dead code.

- In CreateMetricViewCommand.createMetricViewInV2Catalog, drop the
  extractSourceTables helper that branched on AssetSource/SQLSource.
  The analyzed plan has already resolved both source types, so
  MetricViewHelper.collectTableDependencies(analyzed) alone produces
  fully-qualified 3-part names for the dependency list.

- On the V2 path, additionally capture:
  - metric_view.from.type / metric_view.from.name /
    metric_view.from.sql / metric_view.where via a new
    MetricView.getProperties method (mirrors the canonical metric
    view properties used by other platforms, e.g. Databricks).
  - view.catalogAndNamespace.*, view.sqlConfig.*, view.schemaMode,
    and query-column-name props via ViewHelper.generateViewProperties
    so the V2 metric view can be reloaded and re-parsed with the
    same catalog/namespace and SQL configs as the V1 path.
  - Use TableCatalog.PROP_COMMENT instead of the raw "comment"
    string for consistency.

New tests:

- MetricViewV2CatalogSuite exercises createMetricViewInV2Catalog
  via a non-session catalog (MetricViewRecordingCatalog built on
  InMemoryTableCatalog that captures the full TableInfo passed to
  createTable). Covers asset source, SQL source, and the new
  metric_view.* / view.* properties plus comment round-trip.
Aligns DROP semantics for metric views with how Databricks treats them:
metric views are conceptually views, so DROP VIEW must succeed and
DROP TABLE must fail with WRONG_COMMAND_FOR_OBJECT_TYPE pointing the
caller at DROP VIEW.

V1 (session catalog) path:
- DropTableCommand: allow DROP VIEW on tables whose tableType is
  METRIC_VIEW (in addition to VIEW), and reject DROP TABLE on either
  VIEW or METRIC_VIEW. The required-type message now reads
  "VIEW or METRIC_VIEW" so the rejection is self-explanatory.

V2 path (non-session catalogs):
- V2SessionCatalog.dropTableInternal: also reject when the loaded V1
  table is a METRIC_VIEW, mirroring the existing VIEW rejection.
- DropTableExec: load the table and throw WRONG_COMMAND_FOR_OBJECT_TYPE
  when its table_type property is METRIC_VIEW; this catches
  DROP TABLE catalog.ns.metric_view on arbitrary V2 catalogs.
- New DropMetricViewExec handles DROP VIEW catalog.ns.metric_view on
  non-session V2 catalogs by calling catalog.dropTable when the target
  is a metric view, and throwing WRONG_COMMAND_FOR_OBJECT_TYPE
  (DROP TABLE) when the target is a regular table.
- ResolveSessionCatalog: stop throwing "catalog doesn't support views"
  for V2 DropView; leave the plan in place so DataSourceV2Strategy can
  pick it up and route to DropMetricViewExec.

Tests:
- MetricViewV2CatalogSuite gets three new cases covering
  DROP TABLE rejection on a metric view, DROP VIEW success on a metric
  view, DROP VIEW IF EXISTS no-op on a missing metric view, and
  DROP VIEW rejection on a regular V2 table. The recording catalog
  also propagates table_type into the stored table's properties so the
  routing/checks can find it.
- The existing V1 metric-view tests (SimpleMetricViewSuite) now pass
  cleanly since `withView(...)`'s DROP VIEW cleanup works on metric
  views.
- DDLSuite "drop view" and HiveDDLSuite "drop table using drop view"
  updated to expect "VIEW or METRIC_VIEW" in the requiredType field.
case V1Table(v1Table)
if (v1Table.tableType == CatalogTableType.VIEW &&
!SQLConf.get.getConf(SQLConf.DROP_TABLE_VIEW_ENABLED)) ||
v1Table.tableType == CatalogTableType.METRIC_VIEW =>

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to be alongside the VIEW type check so that SQLConf.DROP_TABLE_VIEW_ENABLED can control the same behavior

Comment on lines +38 to +52
val table = catalog.loadTable(ident)
val tableType = Option(table.properties().get(TableCatalog.PROP_TABLE_TYPE)).orNull
// Metric views are stored as tables in V2 catalogs but must be dropped via DROP VIEW.
if (TableSummary.METRIC_VIEW_TABLE_TYPE.equals(tableType)) {
val qualified =
(catalog.name() +: ident.namespace() :+ ident.name()).mkString(".")
throw QueryCompilationErrors.wrongCommandForObjectTypeError(
operation = "DROP TABLE",
requiredType =
s"${TableSummary.EXTERNAL_TABLE_TYPE} or ${TableSummary.MANAGED_TABLE_TYPE}",
objectName = qualified,
foundType = TableSummary.METRIC_VIEW_TABLE_TYPE,
alternative = "DROP VIEW"
)
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really have to check this? It doesn't check for view type as well. I mean, as long as we can route the command based on table type correctly, you don't have to check it here, right?

val invalidateFunc = () => CommandUtils.uncacheTableOrView(session, r)
DropTableExec(r.catalog.asTableCatalog, r.identifier, ifExists, purge, invalidateFunc) :: Nil

case DropView(r: ResolvedIdentifier, ifExists) =>

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you route based on the table type? Basically, we only route metric view type to DropMetricViewExec.

Addresses review feedback on the previous commit:

- V2SessionCatalog.dropTableInternal: place METRIC_VIEW alongside VIEW
  inside the same `SQLConf.DROP_TABLE_VIEW_ENABLED` predicate so the
  conf controls both (a user opting in to "drop view via DROP TABLE"
  for plain views also gets it for metric views).
- DropTableExec: remove the metric-view rejection. The exec stays
  type-agnostic; routing/validation belongs to the strategy.
- DropMetricViewExec: drop the type check too. The strategy guarantees
  this exec is only chosen for actual metric views (or for IF EXISTS
  no-ops on missing identifiers).
- DataSourceV2Strategy: load the resolved table once and route based
  on its `table_type` property:
    * DROP TABLE on a metric view -> WRONG_COMMAND_FOR_OBJECT_TYPE
      pointing at DROP VIEW.
    * DROP VIEW on a metric view -> DropMetricViewExec.
    * DROP VIEW on a non-existent identifier -> DropMetricViewExec
      (so IF EXISTS is honored uniformly).
    * DROP VIEW on a regular V2 table -> WRONG_COMMAND_FOR_OBJECT_TYPE
      pointing at DROP TABLE.

The existing MetricViewV2CatalogSuite cases still cover all of these
branches and pass unchanged; the V1 SimpleMetricViewSuite and the
InMemoryCatalogedDDLSuite continue to pass.
chenwang-databricks added a commit that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks added a commit that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks added a commit that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks added a commit that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks added a commit that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks pushed a commit that referenced this pull request May 7, 2026
… metadata

### What changes were proposed in this pull request?

Follow-up to apache#55636 addressing post-merge review comments from zikangh:

1. **Deduplicate `isCarryoverPair`.** The carry-over predicate (`_del_cnt = 1 AND _ins_cnt = 1 AND _rv_cnt = 2 AND _min_rv = _max_rv`) was duplicated between the batch path's `addCarryOverPairFilter` and the streaming path's inline filter. Extracted a shared `buildCarryOverPairPredicate` helper and call it from both.

2. **Mark the streaming row-level rewrite via attribute metadata rather than helper column name.** `UnsupportedOperationChecker` previously detected the rewrite by string-matching the `__spark_cdc_events` aggregate alias name. Switched to a metadata marker (`ResolveChangelogTable.streamingPostProcessingMarker`) attached to the alias's output attribute -- mirroring the existing `EventTimeWatermark.delayKey` and `SessionWindow.marker` patterns. The marker travels with the attribute through optimization.

3. **Expand streaming E2E coverage.** New tests in `ChangelogEndToEndSuite`:
   - composite rowId carry-over removal,
   - composite rowId update detection (different tuples kept raw),
   - carry-over + update detection across multiple commits,
   - DELETE-all-rows and UPDATE-all-rows fixtures,
   - append-only workload pass-through,
   - no-op UPDATE labeled as update (rcv differs on pre/post),
   - large carry-over removal (9 carry-over pairs + 1 real delete).

### Why are the changes needed?

zikangh raised these on the merged PR. Bundled together so they can be reviewed and shipped as one follow-up.

### Does this PR introduce _any_ user-facing change?

No. Internal refactor (#1, apache#2) and additional test coverage (apache#3). The behavior of streaming CDC reads is unchanged.

### How was this patch tested?

All 157 tests pass across the four CDC suites:
- `ChangelogResolutionSuite`
- `ResolveChangelogTablePostProcessingSuite`
- `ResolveChangelogTableStreamingPostProcessingSuite`
- `ChangelogEndToEndSuite`

Also confirmed:
- `UnsupportedOperationsSuite` (216 tests) still passes after the marker-based detection switch.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-7)

Closes apache#55653 from gengliangwang/streamingCDC-followup-zikangh.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
chenwang-databricks pushed a commit that referenced this pull request May 7, 2026
### What changes were proposed in this pull request?

Address the open follow-ups from [SPARK-56681](https://issues.apache.org/jira/browse/SPARK-56681) (umbrella for PATH / SPARK-56605 cleanup) in a single cleanup PR. Items #1 and apache#2 were already wired by SPARK-56639; this PR covers the remainder.

| # | Item | Resolution |
|---|---|---|
| #1 | `FunctionResolution.resolveProcedure` was dead code | Already wired by SPARK-56639 (no action). |
| apache#2 | Frozen view / SQL-function PATH wiring unfinished | Already done by SPARK-56639 (no action). |
| apache#3 | `AnalysisContext.resolutionPathEntries` threadlocal | Audit only: confirmed `withNewAnalysisContext` / `reset()` correctly clear it. Full removal needs a coordinated refactor to plumb the path through `RelationResolution` / `FunctionResolution` method calls; flagged as a follow-up. |
| apache#4 | `Analyzer.executeAndCheck` clobbers outer `SQLConf.withExistingConf` | Extracted `runWithSessionConf` helper, added `SQLConf.getExistingConfIfSet`. `executeAndCheck` and `executeSameContext` now share one path that yields to any outer scope. |
| apache#5 | `VariableResolution.allowUnqualifiedSessionTempVariableLookup` force-loads default catalog | Replaced the hot-path catalog read with `CatalogManager.isSystemSessionOnPath`, which inspects stored session-path entries directly. No catalog load on column resolution. |
| apache#6 | `DROP VARIABLE` PATH gate asymmetric with `DECLARE` / `CREATE` | Removed the gate. DDL on session variables (`DECLARE` / `CREATE` / `DROP`) always targets `system.session` directly; only DML (`SET VAR`, `SELECT x`) goes through PATH. |
| apache#7 | `lookupFunctionType` exception swallow too broad | Narrowed from `NonFatal` to the explicit not-found list (`NoSuchFunctionException`, `NoSuchNamespaceException`, `CatalogNotFoundException`, `FORBIDDEN_OPERATION`). Other exceptions propagate. |
| apache#8 | `lookupFunctionType` fan-out had wasteful `system.*` candidates | Filtered them out — `system.session`, `system.builtin`, `system.ai` are already resolved earlier in the same method. |
| apache#9 | Three near-duplicate path-resolution helpers | Lifted into `CatalogManager.resolutionPathEntriesForAnalysis(pinnedEntries, viewCatalogAndNamespace)`. Relation, routine, and procedure resolution all route through it. |
| apache#10 | Tests for the new error paths and gates | Added a DECLARE / SET VAR / DROP cycle test under non-default PATH and a struct-variable field-vs-qualified ambiguity test in `sql-session-variables.sql`. |
| apache#11 | `ProtoToParsedPlanTestSuite.analyzerIsolationConf` was a bare `SQLConf` | Clone `spark.sessionState.conf` and only override `PATH_ENABLED=false`, so all `sparkConf` overrides (ANSI, alias config, ...) propagate automatically. |
| Bonus | `ResolveSetVariable` hardcoded `SYSTEM.SESSION` regardless of actual PATH | `unresolvedVariableError` now takes `Seq[Seq[String]]` path entries with **required** `Origin` (no overloads). DML lookup failures (`SET VAR`, `FETCH ... INTO`) report the full SQL path as a bracketed list, byte-for-byte consistent with `UNRESOLVED_ROUTINE` and `TABLE_OR_VIEW_NOT_FOUND`. DDL name validation in `ResolveCatalogs` continues to report `[system.session]` since PATH does not apply there. Origin is plumbed through `VariableManager.set` so all error sites carry a `queryContext` pointing at the offending variable identifier (parser opt-ins via `withOrigin(identifierReference)` so the highlight is the variable name, not the whole statement). |

### Why are the changes needed?

These are the cleanup items called out on SPARK-56681 from the post-merge source review of SPARK-56605. They eliminate dead code paths, plug user-visible bugs (force-loading a misconfigured default catalog on column resolution; clobbering pinned session configs; swallowing real catalog errors as `UNRESOLVED_ROUTINE`), remove the asymmetry between DDL and DML on session variables, and make `UNRESOLVED_VARIABLE` self-consistent with the other "not found" errors.

### Does this PR introduce _any_ user-facing change?

Yes.

- **`UNRESOLVED_VARIABLE.searchPath`** is now rendered as a bracketed list. For DML lookups (`SET VAR`, `FETCH ... INTO`), the list reflects the actual SQL PATH that was consulted instead of a hardcoded `SYSTEM.SESSION`. For DDL name validation (`DECLARE` / `DROP` with a non-session namespace), the list is `[`` `system`.`session` ``]` since PATH does not apply.
- **`UNRESOLVED_VARIABLE`** now always carries a `queryContext` that highlights just the offending variable identifier (e.g. `"builtin.var1"`, `"ses.var1"`), not the whole `DECLARE` / `SET VAR` statement.
- **`DROP TEMPORARY VARIABLE`** no longer raises `UNRESOLVED_VARIABLE` when the SQL PATH does not contain `system.session`. DDL on session variables ignores PATH, matching the existing behaviour of `DECLARE OR REPLACE VARIABLE`.
- **`lookupFunctionType`** no longer swallows non–`NotFound` errors. A catalog reporting `PERMISSION_DENIED` (or similar) for a function lookup now propagates instead of silently producing `UNRESOLVED_ROUTINE`.

### How was this patch tested?

- Added `sql-session-variables.sql` regression test for the struct-variable field-vs-qualified ambiguity (`DECLARE VARIABLE session STRUCT<a INT>` → `SELECT session.a` succeeds → `DROP` → `SELECT session.a` falls through to `UNRESOLVED_COLUMN`).
- Updated `SetPathSuite`: DECLARE / SET VAR / DROP cycle under a non-default PATH; bonus test asserts the actual rendered search path and the variable-identifier `queryContext`.
- Updated `SqlScriptingExecutionSuite` for the new bracketed `searchPath` and identifier-pinned `queryContext`.
- Regenerated `sql-session-variables.sql.out` for the new error shape.
- Added `resolutionPathEntriesForAnalysis` stubs to mocked `CatalogManager` instances in `PlanResolutionSuite`, `AlignAssignmentsSuiteBase`, and `TableLookupCacheSuite`.
- Ran focused suites locally; all pass:
  - `build/sbt 'sql/testOnly *SetPathSuite *SqlScriptingExecutionSuite *ExecuteImmediateEndToEndSuite'`
  - `build/sbt 'sql/testOnly *SimpleSQLViewSuite *SQLFunctionSuite'`
  - `build/sbt 'sql/testOnly *PlanResolutionSuite *UpdateTableAlignAssignmentsSuite *MergeIntoTableAlignAssignmentsSuite'`
  - `build/sbt 'catalyst/testOnly *TableLookupCacheSuite *AnalysisSuite *AnalysisErrorSuite *LookupFunctionsSuite'`
  - `build/sbt 'sql/testOnly *FunctionQualificationSuite *RelationQualificationSuite *DataSourceV2FunctionSuite'`
  - `build/sbt 'sql/testOnly *SQLQuerySuite'`
  - `build/sbt 'connect/testOnly *ProtoToParsedPlanTestSuite'`
  - `build/sbt 'sql/testOnly *SQLQueryTestSuite -- -z sql-session-variables.sql'`
  - Full `org.apache.spark.sql.catalyst.analysis.*`, `org.apache.spark.sql.catalyst.parser.*`, and `org.apache.spark.sql.analysis.resolver.*` suites.
- `scalastyle` and `scalafmt` clean across catalyst, sql, and connect modules.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor Claude Opus 4.7

Closes apache#55647 from srielau/SPARK-56681-patch-clean-up.

Authored-by: Serge Rielau <serge@rielau.com>
Signed-off-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant