[SPARK-56700][SS] Make DataStreamReader.name public by ericm-db · Pull Request #55651 · apache/spark

ericm-db · 2026-05-01T19:07:38Z

What changes were proposed in this pull request?

Remove the private[sql] access modifier from DataStreamReader.name and add the method as a public abstract API to the DataStreamReader base class.

Added abstract def name(sourceName: String): this.type to the API base class (sql/api/.../DataStreamReader.scala)
Changed both classic and connect implementations from private[sql] def name to override def name
Moved Scaladoc to the base class; implementations use @inheritdoc

Why are the changes needed?

The name method was introduced in SPARK-56453 as private[sql] while the API was being finalized. Now that the feature is ready, making it public allows users to assign names to streaming sources for stable checkpoint metadata and source evolution.

Does this PR introduce any user-facing change?

Yes. DataStreamReader.name(sourceName) is now a public @Experimental API available to all users. Previously it was package-private to org.apache.spark.sql.

How was this patch tested?

Existing tests cover the name functionality. This change only modifies the access level.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.6)

anishshri-db

lgtm thanks

dongjoon-hyun

+1, LGTM.

dongjoon-hyun

Could you make the CI happy, @ericm-db ?

Scalastyle checks passed.
The scalafmt check failed on sql/connect or sql/connect at following occurrences:

org.apache.maven.plugin.MojoExecutionException: Scalafmt: Unformatted files found
Error:  Failed to execute goal org.antipathy:mvn-scalafmt_2.13:1.1.1713302731.c3d0074:format (default-cli) on project spark-sql-api_2.13: Error formatting Scala files: Scalafmt: Unformatted files found -> [Help 1]

Before submitting your change, please make sure to format your code using the following command:
./build/mvn scalafmt:format -Dscalafmt.skip=false -Dscalafmt.validateOnly=false -Dscalafmt.changedOnly=false -pl sql/api -pl sql/connect/common -pl sql/connect/server -pl sql/connect/shims -pl sql/connect/client/jvm

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

### What changes were proposed in this pull request? Remove the `private[sql]` access modifier from `DataStreamReader.name` and add the method as a public abstract API to the `DataStreamReader` base class. - Added abstract `def name(sourceName: String): this.type` to the API base class (`sql/api/.../DataStreamReader.scala`) - Changed both classic and connect implementations from `private[sql] def name` to `override def name` - Moved Scaladoc to the base class; implementations use `inheritdoc` ### Why are the changes needed? The `name` method was introduced in SPARK-56453 as `private[sql]` while the API was being finalized. Now that the feature is ready, making it public allows users to assign names to streaming sources for stable checkpoint metadata and source evolution. ### Does this PR introduce _any_ user-facing change? Yes. `DataStreamReader.name(sourceName)` is now a public `Experimental` API available to all users. Previously it was package-private to `org.apache.spark.sql`. ### How was this patch tested? Existing tests cover the `name` functionality. This change only modifies the access level. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.6) Closes #55651 from ericm-db/datastreamreader-name-public. Authored-by: ericm-db <eric.marnadi@databricks.com> Signed-off-by: Anish Shrigondekar <anish.shrigondekar@databricks.com> (cherry picked from commit 0af3d42) Signed-off-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>

…py cherry-pick prompts ### What changes were proposed in this pull request? When a committer manually types `branch-M.N` at the cherry-pick prompt while `branch-M.x` exists and has not yet received the commit, the script now surfaces the Upstream-First policy and offers to pick into both branches in one step (the policy-compliant default). The committer can still pick only `branch-M.N` if the commit is genuinely a `branch-M.N`-only maintenance bugfix, or abort. Implementation notes: - Split `cherry_pick` into `_do_cherry_pick` (fetch + cherry-pick + push) and `cherry_pick` (prompt + policy check). The policy wrapper returns a list of refs so the main loop can advance its remaining-branches list correctly when one prompt consumes two branches. - Replace the `branch_iter` iterator with a mutable `remaining_branches` list in the main cherry-pick loop, so picks consumed by the two-branch path are accounted for in the next prompt's default. - Add an `already_picked` parameter to `cherry_pick` so the policy check skips its prompt when `branch-M.x` is in the set of refs already touched this session (e.g. when the PR was merged into `branch-M.x` and the loop is now picking into `branch-M.N`). ### Why are the changes needed? The Upstream-First backporting policy (documented in the header comment of `dev/merge_spark_pr.py`) requires non-bugfix commits to flow through `branch-M.x` before reaching `branch-M.N`. The merge script already orders `branch-M.x` ahead of `branch-M.N` as the cherry-pick default. However, when a committer types `branch-M.N` at the prompt, the script silently proceeds and `branch-M.x` is never revisited. This has led to commits landing on `branch-4.2` but missing `branch-4.x`. Six such commits observed on the current branches (as of 2026-05-22): - SPARK-56700 (#55651) - SPARK-56676 (#55623) - SPARK-56838 (#55836) - SPARK-56650 (#55589) - SPARK-56856 (#55969) - SPARK-56977 (#56023) All six landed on master and `branch-4.2` but were not cherry-picked to `branch-4.x`, requiring follow-up backports. ### Does this PR introduce _any_ user-facing change? Yes for committers using `dev/merge_spark_pr.py`. When the typed cherry-pick target is `branch-M.N` and `branch-M.x` exists and is not yet picked, an additional prompt asks whether to pick into both. Accepting the default ("both") preserves prior behavior plus an extra cherry-pick to `branch-M.x`. No change when the committer accepts the default `branch-M.x` target, or when picking into `branch-M.x` first and `branch-M.N` second (the typical policy-compliant flow). ### How was this patch tested? - `python3 -m doctest dev/merge_spark_pr.py` passes (34/34, all pre-existing tests — none cover the new policy logic). - New `cherry_pick` policy logic was reviewed for behavior but **not exercised end-to-end**: actually running `merge_spark_pr.py` requires committer privileges and a live open PR to merge. Edge cases were traced by reading the code (PR target = master with manual branch-M.N entry; PR target = branch-M.x with default branch-M.N pick; multiple iterations after a two-branch pick). - Reviewers familiar with the merge flow are encouraged to verify behavior on first real use, especially the abort path and the interaction with manual conflict resolution inside `_do_cherry_pick`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7) Closes #56058 from viirya/infra-merge-script-upstream-first-policy. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>

Remove the `private[sql]` access modifier from `DataStreamReader.name` and add the method as a public abstract API to the `DataStreamReader` base class. - Added abstract `def name(sourceName: String): this.type` to the API base class (`sql/api/.../DataStreamReader.scala`) - Changed both classic and connect implementations from `private[sql] def name` to `override def name` - Moved Scaladoc to the base class; implementations use `inheritdoc` The `name` method was introduced in SPARK-56453 as `private[sql]` while the API was being finalized. Now that the feature is ready, making it public allows users to assign names to streaming sources for stable checkpoint metadata and source evolution. Yes. `DataStreamReader.name(sourceName)` is now a public `Experimental` API available to all users. Previously it was package-private to `org.apache.spark.sql`. Existing tests cover the `name` functionality. This change only modifies the access level. Generated-by: Claude Code (Claude Opus 4.6) Closes #55651 from ericm-db/datastreamreader-name-public. Authored-by: ericm-db <eric.marnadi@databricks.com> Signed-off-by: Anish Shrigondekar <anish.shrigondekar@databricks.com> (cherry picked from commit 0af3d42)

anishshri-db approved these changes May 1, 2026

View reviewed changes

dongjoon-hyun approved these changes May 2, 2026

View reviewed changes

dongjoon-hyun reviewed May 2, 2026

View reviewed changes

HyukjinKwon approved these changes May 4, 2026

View reviewed changes

ericm-db and others added 4 commits May 5, 2026 09:35

[SPARK-56700][SS] Make DataStreamReader.name public

91010a8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

[SPARK-56700][SS] Add MIMA exclusion for DataStreamReader.name

e78a385

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

[SPARK-56700][SS] Fix scalafmt formatting

56eccb2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

empty commit to trigger ci

40bef88

ericm-db force-pushed the datastreamreader-name-public branch from fa3df24 to 40bef88 Compare May 5, 2026 16:35

anishshri-db closed this in 0af3d42 May 6, 2026

viirya mentioned this pull request May 22, 2026

[SPARK-57002][INFRA] Enforce Upstream-First policy in merge_spark_pr.py cherry-pick prompts #56058

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-56700][SS] Make DataStreamReader.name public#55651

[SPARK-56700][SS] Make DataStreamReader.name public#55651
ericm-db wants to merge 4 commits into
apache:masterfrom
ericm-db:datastreamreader-name-public

ericm-db commented May 1, 2026

Uh oh!

anishshri-db left a comment

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

ericm-db commented May 1, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

anishshri-db left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants