Skip to content

[SPARK-57496][SQL][BUILD][4.2] Keep the Types Framework ops and UDF worker packages out of the published API#56571

Closed
cloud-fan wants to merge 1 commit into
apache:branch-4.2from
cloud-fan:SPARK-57496-4.2
Closed

[SPARK-57496][SQL][BUILD][4.2] Keep the Types Framework ops and UDF worker packages out of the published API#56571
cloud-fan wants to merge 1 commit into
apache:branch-4.2from
cloud-fan:SPARK-57496-4.2

Conversation

@cloud-fan

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Backport of #56551 to branch-4.2. Two related changes that keep internal packages out of the published 4.2.0 API surface:

  1. Move the client-side Types Framework ops -- TypeApiOps and TimeTypeApiOps -- from org.apache.spark.sql.types.ops to org.apache.spark.sql.catalyst.types.ops, co-located with the server-side TypeOps family. Consumer imports are updated; same-package consumers drop the now-redundant import. (The TimestampNanos*ApiOps types moved in the master PR do not exist on branch-4.2, so they are not part of this backport.)
  2. Exclude org.apache.spark.udf.worker from the generated API docs in project/SparkBuild.scala's ignoreUndocumentedPackages.

Why are the changes needed?

The *ApiOps types are internal plumbing of the Types Framework (the client-side counterpart to catalyst's TypeOps), but they lived inside the public org.apache.spark.sql.types package, so they leaked into the published PySpark/Scala API of the unreleased 4.2.0 line. org.apache.spark.sql.catalyst.* is already excluded from both the generated docs (ignoreUndocumentedPackages) and MiMa (MimaExcludes), so relocating them there makes them internal with no new build/MiMa entries and mirrors how the server-side TypeOps is already handled.

org.apache.spark.udf.worker is UDF-worker infrastructure (mostly protobuf-generated *OrBuilder Java plus worker internals) that surfaced as public API. Its modules aren't MiMa-checked, and the generated Java can't carry a Scala visibility qualifier, so excluding the package from the docs is the appropriate fix.

Does this PR introduce any user-facing change?

No. Relative to released Spark there is no change; the affected types are new in the unreleased 4.2.0 line and were never intended to be public. This only removes them from the generated API docs (and, for the ops, the binary-compatibility surface) before release. There is no behavior change.

How was this patch tested?

No new tests -- this is a package relocation plus a build-config change with no logic change. The relocated classes are exercised by existing suites and the cast / Row / HiveResult paths; CI compiles all affected modules and runs scalastyle, which enforces the import-ordering updates made here.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

… packages out of the published API

Move the client-side Types Framework ops (TypeApiOps, TimeTypeApiOps,
TimestampNanosTypeApiOps) from org.apache.spark.sql.types.ops to
org.apache.spark.sql.catalyst.types.ops. They are internal plumbing
(parallel to the server-side TypeOps) but sat inside the public
org.apache.spark.sql.types package, leaking into the published API. The
catalyst package is already excluded from both the generated docs
(ignoreUndocumentedPackages) and MiMa (MimaExcludes), so co-locating the
client ops there with the server-side TypeOps keeps them out of the
public surface with no new build/MiMa entries.

Also exclude org.apache.spark.udf.worker from the generated docs in
SparkBuild.scala: it is UDF-worker infrastructure (mostly protobuf-
generated *OrBuilder Java plus worker internals) that surfaced as public
API.

Co-authored-by: Isaac
@cloud-fan cloud-fan changed the title [SPARK-57496][SQL][BUILD] Keep the Types Framework ops and UDF worker packages out of the published API [SPARK-57496][SQL][BUILD][4.2] Keep the Types Framework ops and UDF worker packages out of the published API Jun 17, 2026
@cloud-fan

Copy link
Copy Markdown
Contributor Author

cc @huaxingao @dongjoon-hyun

@cloud-fan

Copy link
Copy Markdown
Contributor Author

test timeout is unrelated (pass compilation is sufficient), thanks for review, merging to 4.2

cloud-fan added a commit that referenced this pull request Jun 17, 2026
…orker packages out of the published API

### What changes were proposed in this pull request?

Backport of #56551 to `branch-4.2`. Two related changes that keep internal packages out of the published 4.2.0 API surface:

1. Move the client-side Types Framework ops -- `TypeApiOps` and `TimeTypeApiOps` -- from `org.apache.spark.sql.types.ops` to `org.apache.spark.sql.catalyst.types.ops`, co-located with the server-side `TypeOps` family. Consumer imports are updated; same-package consumers drop the now-redundant import. (The `TimestampNanos*ApiOps` types moved in the master PR do not exist on `branch-4.2`, so they are not part of this backport.)
2. Exclude `org.apache.spark.udf.worker` from the generated API docs in `project/SparkBuild.scala`'s `ignoreUndocumentedPackages`.

### Why are the changes needed?

The `*ApiOps` types are internal plumbing of the Types Framework (the client-side counterpart to catalyst's `TypeOps`), but they lived inside the public `org.apache.spark.sql.types` package, so they leaked into the published PySpark/Scala API of the unreleased 4.2.0 line. `org.apache.spark.sql.catalyst.*` is already excluded from both the generated docs (`ignoreUndocumentedPackages`) and MiMa (`MimaExcludes`), so relocating them there makes them internal with no new build/MiMa entries and mirrors how the server-side `TypeOps` is already handled.

`org.apache.spark.udf.worker` is UDF-worker infrastructure (mostly protobuf-generated `*OrBuilder` Java plus worker internals) that surfaced as public API. Its modules aren't MiMa-checked, and the generated Java can't carry a Scala visibility qualifier, so excluding the package from the docs is the appropriate fix.

### Does this PR introduce _any_ user-facing change?

No. Relative to released Spark there is no change; the affected types are new in the unreleased 4.2.0 line and were never intended to be public. This only removes them from the generated API docs (and, for the ops, the binary-compatibility surface) before release. There is no behavior change.

### How was this patch tested?

No new tests -- this is a package relocation plus a build-config change with no logic change. The relocated classes are exercised by existing suites and the cast / `Row` / `HiveResult` paths; CI compiles all affected modules and runs scalastyle, which enforces the import-ordering updates made here.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

Closes #56571 from cloud-fan/SPARK-57496-4.2.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan cloud-fan closed this Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants