[SPARK-57496][SQL][BUILD][4.2] Keep the Types Framework ops and UDF worker packages out of the published API#56571
Closed
cloud-fan wants to merge 1 commit into
Closed
[SPARK-57496][SQL][BUILD][4.2] Keep the Types Framework ops and UDF worker packages out of the published API#56571cloud-fan wants to merge 1 commit into
cloud-fan wants to merge 1 commit into
Conversation
… packages out of the published API Move the client-side Types Framework ops (TypeApiOps, TimeTypeApiOps, TimestampNanosTypeApiOps) from org.apache.spark.sql.types.ops to org.apache.spark.sql.catalyst.types.ops. They are internal plumbing (parallel to the server-side TypeOps) but sat inside the public org.apache.spark.sql.types package, leaking into the published API. The catalyst package is already excluded from both the generated docs (ignoreUndocumentedPackages) and MiMa (MimaExcludes), so co-locating the client ops there with the server-side TypeOps keeps them out of the public surface with no new build/MiMa entries. Also exclude org.apache.spark.udf.worker from the generated docs in SparkBuild.scala: it is UDF-worker infrastructure (mostly protobuf- generated *OrBuilder Java plus worker internals) that surfaced as public API. Co-authored-by: Isaac
Contributor
Author
dongjoon-hyun
approved these changes
Jun 17, 2026
huaxingao
approved these changes
Jun 17, 2026
uros-b
approved these changes
Jun 17, 2026
Contributor
Author
|
test timeout is unrelated (pass compilation is sufficient), thanks for review, merging to 4.2 |
cloud-fan
added a commit
that referenced
this pull request
Jun 17, 2026
…orker packages out of the published API ### What changes were proposed in this pull request? Backport of #56551 to `branch-4.2`. Two related changes that keep internal packages out of the published 4.2.0 API surface: 1. Move the client-side Types Framework ops -- `TypeApiOps` and `TimeTypeApiOps` -- from `org.apache.spark.sql.types.ops` to `org.apache.spark.sql.catalyst.types.ops`, co-located with the server-side `TypeOps` family. Consumer imports are updated; same-package consumers drop the now-redundant import. (The `TimestampNanos*ApiOps` types moved in the master PR do not exist on `branch-4.2`, so they are not part of this backport.) 2. Exclude `org.apache.spark.udf.worker` from the generated API docs in `project/SparkBuild.scala`'s `ignoreUndocumentedPackages`. ### Why are the changes needed? The `*ApiOps` types are internal plumbing of the Types Framework (the client-side counterpart to catalyst's `TypeOps`), but they lived inside the public `org.apache.spark.sql.types` package, so they leaked into the published PySpark/Scala API of the unreleased 4.2.0 line. `org.apache.spark.sql.catalyst.*` is already excluded from both the generated docs (`ignoreUndocumentedPackages`) and MiMa (`MimaExcludes`), so relocating them there makes them internal with no new build/MiMa entries and mirrors how the server-side `TypeOps` is already handled. `org.apache.spark.udf.worker` is UDF-worker infrastructure (mostly protobuf-generated `*OrBuilder` Java plus worker internals) that surfaced as public API. Its modules aren't MiMa-checked, and the generated Java can't carry a Scala visibility qualifier, so excluding the package from the docs is the appropriate fix. ### Does this PR introduce _any_ user-facing change? No. Relative to released Spark there is no change; the affected types are new in the unreleased 4.2.0 line and were never intended to be public. This only removes them from the generated API docs (and, for the ops, the binary-compatibility surface) before release. There is no behavior change. ### How was this patch tested? No new tests -- this is a package relocation plus a build-config change with no logic change. The relocated classes are exercised by existing suites and the cast / `Row` / `HiveResult` paths; CI compiles all affected modules and runs scalastyle, which enforces the import-ordering updates made here. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code Closes #56571 from cloud-fan/SPARK-57496-4.2. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Backport of #56551 to
branch-4.2. Two related changes that keep internal packages out of the published 4.2.0 API surface:TypeApiOpsandTimeTypeApiOps-- fromorg.apache.spark.sql.types.opstoorg.apache.spark.sql.catalyst.types.ops, co-located with the server-sideTypeOpsfamily. Consumer imports are updated; same-package consumers drop the now-redundant import. (TheTimestampNanos*ApiOpstypes moved in the master PR do not exist onbranch-4.2, so they are not part of this backport.)org.apache.spark.udf.workerfrom the generated API docs inproject/SparkBuild.scala'signoreUndocumentedPackages.Why are the changes needed?
The
*ApiOpstypes are internal plumbing of the Types Framework (the client-side counterpart to catalyst'sTypeOps), but they lived inside the publicorg.apache.spark.sql.typespackage, so they leaked into the published PySpark/Scala API of the unreleased 4.2.0 line.org.apache.spark.sql.catalyst.*is already excluded from both the generated docs (ignoreUndocumentedPackages) and MiMa (MimaExcludes), so relocating them there makes them internal with no new build/MiMa entries and mirrors how the server-sideTypeOpsis already handled.org.apache.spark.udf.workeris UDF-worker infrastructure (mostly protobuf-generated*OrBuilderJava plus worker internals) that surfaced as public API. Its modules aren't MiMa-checked, and the generated Java can't carry a Scala visibility qualifier, so excluding the package from the docs is the appropriate fix.Does this PR introduce any user-facing change?
No. Relative to released Spark there is no change; the affected types are new in the unreleased 4.2.0 line and were never intended to be public. This only removes them from the generated API docs (and, for the ops, the binary-compatibility surface) before release. There is no behavior change.
How was this patch tested?
No new tests -- this is a package relocation plus a build-config change with no logic change. The relocated classes are exercised by existing suites and the cast /
Row/HiveResultpaths; CI compiles all affected modules and runs scalastyle, which enforces the import-ordering updates made here.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code