Skip to content

[SPARK-43752][SQL] Support column DEFAULT values in V2 write commands#55127

Closed
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:SPARK-43752
Closed

[SPARK-43752][SQL] Support column DEFAULT values in V2 write commands#55127
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:SPARK-43752

Conversation

@LuciferYang

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Resolve column DEFAULT references in V2 write commands (AppendData, OverwriteByExpression, OverwritePartitionsDynamic).

Previously, ResolveColumnDefaultInCommandInputQuery only handled InsertIntoStatement (V1 path) and SetVariable. V2 write commands with DEFAULT would fail with unresolved attribute errors.

Changes:

  • Analyzer.ResolveReferences: dispatch V2WriteCommand to resolveColumnDefaultInCommandInputQuery when the query contains unresolved attributes
  • ResolveColumnDefaultInCommandInputQuery: add V2WriteCommand case that resolves DEFAULT by matching query columns to the table schema by position
  • Remove the TODO (SPARK-43752) comment

Why are the changes needed?

INSERT INTO v2_table VALUES (1, DEFAULT) fails for V2 data sources even when the table has column default values defined. This blocks V2 adoption for catalogs that support SUPPORT_COLUMN_DEFAULT_VALUE.

Does this PR introduce any user-facing change?

Yes. DEFAULT keyword now works in INSERT INTO / INSERT OVERWRITE for V2 tables.

How was this patch tested?

  • Pass Github Actions
  • Added test in ResolveDefaultColumnsSuite

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code 4.6

@LuciferYang

LuciferYang commented Apr 1, 2026

Copy link
Copy Markdown
Contributor Author

cc @cloud-fan @gengliangwang

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang .

@LuciferYang

Copy link
Copy Markdown
Contributor Author

Thank you @dongjoon-hyun

@cloud-fan

Copy link
Copy Markdown
Contributor

The code changes in this PR appear to be dead code. The test is valuable, but it passes without the Analyzer.scala and ResolveColumnDefaultInCommandInputQuery.scala changes — I verified this locally.

Why it's dead code:

For INSERT INTO v2_table VALUES (1, DEFAULT), the resolution flow is:

  1. The parser produces InsertIntoStatement (for both V1 and V2 tables — not just V1).
  2. In the Resolution batch, ResolveReferences intercepts InsertIntoStatement and dispatches to resolveColumnDefaultInCommandInputQuery, which resolves DEFAULT using i.table.schema. For V2 tables, the DataSourceV2Relation schema includes column default metadata (via CatalogV2Util.encodeDefaultValue), so DEFAULT is resolved correctly at this stage.
  3. ResolveInsertInto only converts InsertIntoStatement to V2WriteCommand (AppendData/OverwriteByExpression/OverwritePartitionsDynamic) after i.query.resolved is true — by which point DEFAULT is already gone.

Since ResolveInsertInto (line 446 in the batch) runs before ResolveReferences (line 455), but requires a fully resolved query to convert, the ordering guarantees that DEFAULT is always resolved in the InsertIntoStatement stage before conversion to V2WriteCommand. No code path (SQL or DataFrame API) creates a V2WriteCommand with unresolved DEFAULT attributes.

The InsertIntoStatement in the PR description is labeled as the "V1 path," but it actually handles both V1 and V2 tables.

Verification: I reverted the code changes (kept the test) on a branch without this commit and ran:

build/sbt "sql/testOnly org.apache.spark.sql.ResolveDefaultColumnsSuite -- -t \"SPARK-43752\""

Result: Tests: succeeded 1, failed 0 — the test passes without the code changes.

@cloud-fan

Copy link
Copy Markdown
Contributor

If no objections I'll revert it tomorrow.

@LuciferYang

Copy link
Copy Markdown
Contributor Author

If no objections I'll revert it tomorrow.

Thanks for pointing this out. My bad. This issue came up while I was working on the recent SPARK-56170 patches. As @cloud-fan suggested, let's keep the test case and revert the changes. We can fix them together later when DSV2 File Table is actually ready to advance to this part.

@dongjoon-hyun

Copy link
Copy Markdown
Member

+1 for reverting. Thank you for fixing this.

cloud-fan added a commit that referenced this pull request Apr 16, 2026
### What changes were proposed in this pull request?

This PR reverts the code changes from #55127 (commit eca57ea) and moves the test to `InsertIntoTests` where it properly runs across V2 INSERT SQL test suites.

Specifically:
- **Removes** the `V2WriteCommand` case added to `Analyzer.ResolveReferences` and `ResolveColumnDefaultInCommandInputQuery`
- **Removes** the test from `ResolveDefaultColumnsSuite` (wrong location for a V2 end-to-end test)
- **Adds** the test to `InsertIntoSQLOnlyTests` in `InsertIntoTests.scala`, which runs in suites with `includeSQLOnlyTests = true`: `DataSourceV2SQLSuite`, `DataSourceV2SQLSessionCatalogSuite`, and `V1WriteFallbackSessionCatalogSuite`. This is correct since `DEFAULT` is SQL-only syntax.

### Why are the changes needed?

The code added by #55127 is dead code. For `INSERT INTO v2_table VALUES (1, DEFAULT)`, the resolution flow is:

1. The parser produces `InsertIntoStatement` for **both** V1 and V2 tables.
2. In the Resolution batch, `ResolveReferences` dispatches `InsertIntoStatement` to `resolveColumnDefaultInCommandInputQuery`, which resolves DEFAULT using the table schema. For V2 tables, the `DataSourceV2Relation` schema includes column default metadata (via `CatalogV2Util.encodeDefaultValue`), so DEFAULT is resolved correctly at this stage.
3. `ResolveInsertInto` only converts `InsertIntoStatement` to a `V2WriteCommand` (`AppendData`/`OverwriteByExpression`/`OverwritePartitionsDynamic`) **after** the query is fully resolved — by which point DEFAULT is already gone.

No code path (SQL or DataFrame API) creates a `V2WriteCommand` with unresolved DEFAULT attributes. The `PlanResolutionSuite` test "INSERT INTO table with default column value" already verifies this resolution at the plan level.

The test is still valuable and passes without the code changes — this PR moves it to the proper location.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Moved the existing test to `InsertIntoSQLOnlyTests` in `InsertIntoTests.scala`. The test verifies:
- `INSERT INTO` with partial DEFAULT values
- `INSERT INTO` with all DEFAULT values
- `INSERT OVERWRITE` with DEFAULT values

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code 4.6

Closes #55348 from cloud-fan/SPARK-43752-revert.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants