[SPARK-56719][SS] Add DataStreamWriter.name() API for sink evolution by ericm-db · Pull Request #55672 · apache/spark

ericm-db · 2026-05-04T19:36:20Z

What changes were proposed in this pull request?

This PR adds the ability to name streaming sinks via the name() method on DataStreamWriter, laying the groundwork for sink evolution capability. This is analogous to the existing source evolution support (DataStreamReader.name()).

Changes:

Add name(sinkName) method to DataStreamWriter (API abstract method, classic implementation, Connect stub)
Add sinkName: Option[String] field to WriteToStream and userSpecifiedSinkName: Option[String] to WriteToStreamStatement plan nodes
Add spark.sql.streaming.queryEvolution.enableSinkEvolution internal config to SQLConf
Add sink name validation — names must be alphanumeric + underscore only
Add enforcement in MicroBatchExecution — when sink evolution is enabled, sinks must be explicitly named
Add MicroBatchExecution.DEFAULT_SINK_NAME ("sink-0") for backward compatibility
Thread sinkName through StreamingQueryManager and ResolveWriteToStream
Add error conditions: INVALID_SINK_NAME, UNNAMED_STREAMING_SINKS_WITH_ENFORCEMENT
Add QueryCompilationErrors.invalidStreamingSinkNameError
Add StreamingSinkEvolutionSuite with tests for validation and enforcement

All new APIs are private[sql] or internal() — the name() method is not yet publicly callable. It will be opened up once commit log support for persisting sink metadata is added in a follow-up PR.

Why are the changes needed?

Currently, streaming queries have no mechanism for sink evolution. If a user wants to change the sink of a streaming query while preserving the checkpoint, there is no way to track which sink was used historically. This PR introduces the naming API as the first step toward full sink evolution support, where sinks can be added, removed, or replaced while maintaining checkpoint integrity.

This mirrors the existing source evolution support added via DataStreamReader.name() and spark.sql.streaming.queryEvolution.enableSourceEvolution.

Does this PR introduce any user-facing change?

No. All new APIs are private[sql] and the config is internal(). No user-facing changes until the feature is fully implemented with commit log support in a follow-up PR.

How was this patch tested?

Added StreamingSinkEvolutionSuite with 7 test cases covering:

Invalid sink name validation (hyphen, space, special characters)
Valid sink name patterns (alphanumeric, underscore, digits)
Enforcement: unnamed sink with evolution enabled throws UNNAMED_STREAMING_SINKS_WITH_ENFORCEMENT
Enforcement: unnamed sink without evolution enabled succeeds (backward compatibility)
Named sink with evolution enabled succeeds
Continuing with the same sink name across restarts works

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-6)

Co-authored-by: Isaac

…ception for sink evolution Adds the MiMa `ReversedMissingMethodProblem` exclusion for the newly added `DataStreamWriter.name()` API, and registers the new `spark.sql.streaming.queryEvolution.enableSinkEvolution` SQL config in the binding-policy exceptions file (consistent with its `enableSourceEvolution` sibling). Co-authored-by: Isaac

anishshri-db

lgtm pending nit/question

… source and sink name validation

…ator Replace Scaladoc `[[AnalysisException]]` and `[[IllegalArgumentException]]` references with backtick code spans. The Scaladoc-to-Javadoc conversion turned them into unresolved `{@link ...}` references because the generated Java file does not carry imports, breaking the unidoc build.

### What changes were proposed in this pull request? This PR adds the ability to name streaming sinks via the `name()` method on `DataStreamWriter`, laying the groundwork for sink evolution capability. This is analogous to the existing source evolution support (`DataStreamReader.name()`). **Changes:** - Add `name(sinkName)` method to `DataStreamWriter` (API abstract method, classic implementation, Connect stub) - Add `sinkName: Option[String]` field to `WriteToStream` and `userSpecifiedSinkName: Option[String]` to `WriteToStreamStatement` plan nodes - Add `spark.sql.streaming.queryEvolution.enableSinkEvolution` internal config to `SQLConf` - Add sink name validation — names must be alphanumeric + underscore only - Add enforcement in `MicroBatchExecution` — when sink evolution is enabled, sinks must be explicitly named - Add `MicroBatchExecution.DEFAULT_SINK_NAME` (`"sink-0"`) for backward compatibility - Thread `sinkName` through `StreamingQueryManager` and `ResolveWriteToStream` - Add error conditions: `INVALID_SINK_NAME`, `UNNAMED_STREAMING_SINKS_WITH_ENFORCEMENT` - Add `QueryCompilationErrors.invalidStreamingSinkNameError` - Add `StreamingSinkEvolutionSuite` with tests for validation and enforcement All new APIs are `private[sql]` or `internal()` — the `name()` method is not yet publicly callable. It will be opened up once commit log support for persisting sink metadata is added in a follow-up PR. ### Why are the changes needed? Currently, streaming queries have no mechanism for sink evolution. If a user wants to change the sink of a streaming query while preserving the checkpoint, there is no way to track which sink was used historically. This PR introduces the naming API as the first step toward full sink evolution support, where sinks can be added, removed, or replaced while maintaining checkpoint integrity. This mirrors the existing source evolution support added via `DataStreamReader.name()` and `spark.sql.streaming.queryEvolution.enableSourceEvolution`. ### Does this PR introduce _any_ user-facing change? No. All new APIs are `private[sql]` and the config is `internal()`. No user-facing changes until the feature is fully implemented with commit log support in a follow-up PR. ### How was this patch tested? Added `StreamingSinkEvolutionSuite` with 7 test cases covering: - Invalid sink name validation (hyphen, space, special characters) - Valid sink name patterns (alphanumeric, underscore, digits) - Enforcement: unnamed sink with evolution enabled throws `UNNAMED_STREAMING_SINKS_WITH_ENFORCEMENT` - Enforcement: unnamed sink without evolution enabled succeeds (backward compatibility) - Named sink with evolution enabled succeeds - Continuing with the same sink name across restarts works ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (claude-opus-4-6) Closes #55672 from ericm-db/sink-evolution-api. Authored-by: ericm-db <eric.marnadi@databricks.com> Signed-off-by: Anish Shrigondekar <anish.shrigondekar@databricks.com> (cherry picked from commit 2039927) Signed-off-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>

ericm-db added 3 commits May 4, 2026 12:34

[SPARK-56719][SS] Add DataStreamWriter.name() API for sink evolution

0de57ea

Co-authored-by: Isaac

./build/mvn

d47b6d1

anishshri-db approved these changes May 20, 2026

View reviewed changes

Comment thread sql/api/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala

ericm-db added 3 commits May 21, 2026 15:49

[SPARK-56719][SS][FOLLOW-UP] Extract StreamingNameValidator shared by…

035a9f9

… source and sink name validation

scala lint

9c1bfd1

anishshri-db closed this in 2039927 May 23, 2026

dongjoon-hyun mentioned this pull request Jun 11, 2026

[SPARK-57377][INFRA] Add CI check to prevent new entries in the config binding policy exceptions file #56437

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-56719][SS] Add DataStreamWriter.name() API for sink evolution#55672

[SPARK-56719][SS] Add DataStreamWriter.name() API for sink evolution#55672
ericm-db wants to merge 6 commits into
apache:masterfrom
ericm-db:sink-evolution-api

ericm-db commented May 4, 2026

Uh oh!

anishshri-db left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ericm-db commented May 4, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

anishshri-db left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants