Skip to content

[SPARK-56763] Branch 3.5 restore Python 3.8 & R in CI (Continuation of Sarutak's PR)#55886

Closed
holdenk wants to merge 24 commits into
apache:branch-3.5from
holdenk:SPARK-56763-sarutak-3.5-restore-additional-functionality-r2
Closed

[SPARK-56763] Branch 3.5 restore Python 3.8 & R in CI (Continuation of Sarutak's PR)#55886
holdenk wants to merge 24 commits into
apache:branch-3.5from
holdenk:SPARK-56763-sarutak-3.5-restore-additional-functionality-r2

Conversation

@holdenk

@holdenk holdenk commented May 14, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This is a rebase of https://github.com/apache/spark/pull/55740/changes on the PPA and docker fix

This re-enables R doc build and Py3.8

For type testing to continue to work in Py3.8 it changes how we fall back on torch import failure given the lack of ongoing 3.8 support by torch..

Why are the changes needed?

Our R version floats and various things have changed in 4.4 which has broken CI, similarily many of our dependencies float which broke MyPy type checking in Python.

Note: I plan to follow up with a seperate PR to pin our R version (in this branch) back to 4.3 but for now lets fix it (we can also pin to 4.4 if people prefer but I do want to pin the R version eventually).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Base image build workflow passes on GitHub Actions.
  • docker build dev/infra succeeds locally.

Was this patch authored or co-authored using generative AI tooling?

Kiro CLI / Opus 4.6

sarutak and others added 12 commits May 10, 2026 14:46
### What changes were proposed in this pull request?
Add `apt-get update` before `apt-get install` for R-related dev libraries to avoid stale package index causing 404 errors.

### Why are the changes needed?
The `apt-get install` for R dev dependencies (libtiff5-dev, libharfbuzz-dev, etc.) is in a separate RUN layer from the earlier `apt-get update`, so when the package index becomes stale (packages are superseded on the Ubuntu archive), the install fails with 404.

### Does this PR introduce *any* user-facing change?
No.

### How was this patch tested?
CI.

### Was this patch authored or co-authored using generative AI tooling?
No.
Primitive functions (e.g., min, max, sum) do not have environments and
attempting to set one via environment<- has no effect. Since R 4.4.0,
this operation emits a deprecation warning, which causes test failures
when running with options(warn = 2).

Add is.primitive() guards in both processClosure and cleanClosure so
that primitive functions are handled without attempting to access or
modify their environment.
Pin Werkzeug==2.1.2 in Dockerfile to maintain compatibility with
markupsafe==2.0.1 used in the workflow lint step.

Pin ragg==1.2.5 in the workflow before pkgdown installation because
ragg 1.5.x requires libwebp which is not available in the Docker
image, and its configure script fails to find freetype2 headers.
…763-sarutak-3.5-restore-additional-functionality-r2
sfc-gh-hkarau and others added 2 commits May 14, 2026 20:35
- Fix PEP 585 dict[K,V] syntax in plan.py (runtime TypeError on 3.8)
- Add grpcio/protobuf stack for python3.8 in Dockerfile
- Guard unconditional torch imports in ml/connect/classification.py and
  ml/torch/data.py so missing torch fails gracefully instead of crashing
  the test runner
- Restore python3.8 to default executables in python/run-tests.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@holdenk

holdenk commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

CC @sarutak & @gaogaotiantian & @zhengruifeng

@holdenk

holdenk commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

Oh also CC @devin-petersohn

sfc-gh-hkarau and others added 10 commits May 15, 2026 06:09
- Use dpkg --print-architecture for java-8-openjdk path (arm64 compat)
- Add libxslt-dev so lxml builds from source on arm64
- Add python3.8-dev so lxml can find Python.h when compiling
- Let roxygen2 deps (rlang, cli, pkgload etc) float to current for R 4.6 compat
- Fix mypy-python.sql.udtf section name (examples module path)
- Add missing mypy ignore_missing_imports for grpc_status, google.*, IPython

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@holdenk holdenk marked this pull request as ready for review May 28, 2026 22:02
@holdenk holdenk changed the title WIP Spark 56763 sarutak 3.5 restore additional functionality r2 [SPARK-56763] Branch 3.5 restore Python 3.8 in CI (Continuation of Sarutak's PR) May 28, 2026
@holdenk holdenk changed the title [SPARK-56763] Branch 3.5 restore Python 3.8 in CI (Continuation of Sarutak's PR) [SPARK-56763] Branch 3.5 restore Python 3.8 & R in CI (Continuation of Sarutak's PR) May 28, 2026
asf-gitbox-commits pushed a commit that referenced this pull request Jun 2, 2026
…ation of Sarutak's PR)

### What changes were proposed in this pull request?

This is a rebase of https://github.com/apache/spark/pull/55740/changes on the PPA and docker fix

This re-enables R doc build and Py3.8

For type testing to continue to work in Py3.8 it changes how we fall back on torch import failure given the lack of ongoing 3.8 support by torch..

### Why are the changes needed?

Our R version floats and various things have changed in 4.4 which has broken CI, similarily many of our dependencies float which broke MyPy type checking in Python.

Note: I plan to follow up with a seperate PR to pin our R version (in this branch) back to 4.3 but for now lets fix it (we can also pin to 4.4 if people prefer but I do want to pin the R version eventually).

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- `Base image build` workflow passes on GitHub Actions.
- `docker build dev/infra` succeeds locally.

### Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Opus 4.6

Closes #55886 from holdenk/SPARK-56763-sarutak-3.5-restore-additional-functionality-r2.

Lead-authored-by: Holden Karau <holden.karau@snowflake.com>
Co-authored-by: Kousuke Saruta <sarutak@amazon.co.jp>
Co-authored-by: Holden Karau <holden.karau@snowflake.com>
Signed-off-by: Holden Karau <holden.karau@snowflake.com>
@holdenk

holdenk commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Merged to 3.5

@gaogaotiantian

Copy link
Copy Markdown
Contributor

Hi @holdenk , do we need to test python3.8 for spark 3.5? I know we claim to support python3.8, but python3.8 itself was EOL more than 2 years ago. Adding python3.8 to run-test makes the build timeout https://github.com/apache/spark/actions/runs/27903512134/job/82568956068 because we can run the test suite twice (3.8/3.9) in 2 hours. If we really need 3.8 coverage, we need a separate workflow file to run it.

@holdenk

holdenk commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Hmmm if it's EOL I think we're ok dropping it from CI maybe. Unless there are objections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants