[SPARK-56889][PYTHON][INFRA] Drop Python 3.10 Support#55914
Closed
dongjoon-hyun wants to merge 1 commit into
Closed
[SPARK-56889][PYTHON][INFRA] Drop Python 3.10 Support#55914dongjoon-hyun wants to merge 1 commit into
dongjoon-hyun wants to merge 1 commit into
Conversation
203ac7e to
47f7690
Compare
Member
Author
|
Could you review this PR, @LuciferYang ? This is for Apache Spark 4.3. |
Member
Author
|
Thank you, @LuciferYang ~ Merged to master/4.x. |
dongjoon-hyun
added a commit
that referenced
this pull request
May 16, 2026
This PR drops Python 3.10 support at `Apache Spark 4.3.0`. For the record, we have been moved to `Python 3.11` mainly since Apache Spark 4.0.0. - https://github.com/apache/spark/actions/workflows/build_branch40_python.yml (Python 3.11) Python 3.10 reaches end-of-life in October 2026 before Apache Spark 4.3.0 release. Dropping it on `master/branch-4.x` lets PySpark 4.3+ rely on Python 3.11+ language features. Yes. Installing PySpark from `master` on Python 3.10 will fail with `Requires-Python: >=3.11`. Released versions are unaffected. Existing CI (`build_python_3.11.yml` through `build_python_3.14*.yml`, `build_python_minimum.yml`, `build_python_ps_minimum.yml`). Generated-by: Claude Code (Opus 4.7) Closes #55914 from dongjoon-hyun/SPARK-56889. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 0a0d31b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
| pandoc \ | ||
| pkg-config \ | ||
| python3.10 \ | ||
| python3.11 \ |
Contributor
There was a problem hiding this comment.
this PR failed image cahcing
https://github.com/apache/spark/actions/runs/26010261434/job/76449371588
and IIRC the 'dev/infra/Dockerfile' is only used for branch-3.5, we don't need to touch it in master
Member
Author
There was a problem hiding this comment.
Thank you for reporting. I made an alternative follow-up.
dongjoon-hyun
pushed a commit
that referenced
this pull request
May 18, 2026
…file ### What changes were proposed in this pull request? This is a partial revert of #55914 (SPARK-56889) restricted to `dev/infra/Dockerfile`. It restores the file to its state at `0a0d31bea00~1` — switching the system Python used in the base CI image from `python3.11` back to `python3.10`. All other 20 files changed by #55914 are kept as-is. ### Why are the changes needed? The `Build / Cache base image` workflow has been failing on every branch since #55914 was merged (2026-05-16): https://github.com/apache/spark/actions/runs/26010261434/job/76449371588 The failure aborts at the first `Build and push` step (the base `./dev/infra/` image), with: ``` RUN add-apt-repository ppa:deadsnakes/ppa ... ModuleNotFoundError: No module named 'pyparsing' ERROR: process "/bin/sh -c add-apt-repository ppa:deadsnakes/ppa" did not complete successfully: exit code: 1 ``` This breaks the cache build for all downstream image jobs, since the base layer is shared. Restoring the previous Dockerfile is the smallest change that unblocks the cache workflow while a forward fix is investigated. ### Does this PR introduce _any_ user-facing change? No. CI-only change. PySpark itself is unaffected — only the system Python inside the test image. ### How was this patch tested? This PR re-runs `Build / Cache base image` against the restored Dockerfile. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (model: claude-opus-4-7) Closes #55946 from zhengruifeng/restore-infra-dockerfile-python310. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun
pushed a commit
that referenced
this pull request
May 18, 2026
…file ### What changes were proposed in this pull request? This is a partial revert of #55914 (SPARK-56889) restricted to `dev/infra/Dockerfile`. It restores the file to its state at `0a0d31bea00~1` — switching the system Python used in the base CI image from `python3.11` back to `python3.10`. All other 20 files changed by #55914 are kept as-is. ### Why are the changes needed? The `Build / Cache base image` workflow has been failing on every branch since #55914 was merged (2026-05-16): https://github.com/apache/spark/actions/runs/26010261434/job/76449371588 The failure aborts at the first `Build and push` step (the base `./dev/infra/` image), with: ``` RUN add-apt-repository ppa:deadsnakes/ppa ... ModuleNotFoundError: No module named 'pyparsing' ERROR: process "/bin/sh -c add-apt-repository ppa:deadsnakes/ppa" did not complete successfully: exit code: 1 ``` This breaks the cache build for all downstream image jobs, since the base layer is shared. Restoring the previous Dockerfile is the smallest change that unblocks the cache workflow while a forward fix is investigated. ### Does this PR introduce _any_ user-facing change? No. CI-only change. PySpark itself is unaffected — only the system Python inside the test image. ### How was this patch tested? This PR re-runs `Build / Cache base image` against the restored Dockerfile. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (model: claude-opus-4-7) Closes #55946 from zhengruifeng/restore-infra-dockerfile-python310. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 3608538) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun
added a commit
that referenced
this pull request
Jun 19, 2026
…ython 3.14` by default ### What changes were proposed in this pull request? Like `Apache Spark Docker` repository, this PR aims to use `eclipse-temurin` to use `Ubuntu 26.04` and `Python 3.14` by default for Apache Spark 4.3.0. https://github.com/apache/spark-docker/blob/75f9e755807c67794776276d8f5e3e2ceedfa82b/4.1.2/scala2.13-java17-ubuntu/Dockerfile#L17 ### Why are the changes needed? Apache Spark 4.3.0 dropped Python 3.10. - #55914 However, `azul/zulu-openjdk` only supports `Ubuntu 22.04` which doesn't support Python 3.11 officially. It only have `Python 3.11.0rc1`. ``` $ docker run -it --rm azul/zulu-openjdk:25 bash root3c66cb6be8be:/# cat /etc/os-release | grep VERSION_ID VERSION_ID="22.04" root3c66cb6be8be:/# apt-get update root3c66cb6be8be:/# apt-get install -y python3.11 root3c66cb6be8be:/# python3.11 --version Python 3.11.0rc1 ``` ### Does this PR introduce _any_ user-facing change? Note that previously we support this new combination by the following. ``` $ bin/docker-image-tool.sh -b java_image_name=eclipse-temurin build ``` A user can switch back manually in the same way. ``` $ bin/docker-image-tool.sh -b java_image_name=azul/zulu-openjdk build ``` Although Apache Spark behavior is not changed because it's orthogonal to the underlying JDK, OS and Python versions, yes, the underlying OS becomes `Ubuntu 26.04` and Python becomes `3.14` and Java vendor becomes `Eclipse Temurin`. This is inevitable due to the lack of Python 3.11 support of `Zulu JDK`. **AFTER** ``` $ docker run -it --rm eclipse-temurin:25-jre bash rootde506cc01b8a:/# cat /etc/os-release | grep VERSION_ID VERSION_ID="26.04" rootde506cc01b8a:/# apt-get update rootde506cc01b8a:/# apt-get install -y python3 rootde506cc01b8a:/# python3 --version Python 3.14.4 ``` ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code Closes #56607 from dongjoon-hyun/SPARK-57546. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun
added a commit
that referenced
this pull request
Jun 19, 2026
…ython 3.14` by default Like `Apache Spark Docker` repository, this PR aims to use `eclipse-temurin` to use `Ubuntu 26.04` and `Python 3.14` by default for Apache Spark 4.3.0. https://github.com/apache/spark-docker/blob/75f9e755807c67794776276d8f5e3e2ceedfa82b/4.1.2/scala2.13-java17-ubuntu/Dockerfile#L17 Apache Spark 4.3.0 dropped Python 3.10. - #55914 However, `azul/zulu-openjdk` only supports `Ubuntu 22.04` which doesn't support Python 3.11 officially. It only have `Python 3.11.0rc1`. ``` $ docker run -it --rm azul/zulu-openjdk:25 bash root3c66cb6be8be:/# cat /etc/os-release | grep VERSION_ID VERSION_ID="22.04" root3c66cb6be8be:/# apt-get update root3c66cb6be8be:/# apt-get install -y python3.11 root3c66cb6be8be:/# python3.11 --version Python 3.11.0rc1 ``` Note that previously we support this new combination by the following. ``` $ bin/docker-image-tool.sh -b java_image_name=eclipse-temurin build ``` A user can switch back manually in the same way. ``` $ bin/docker-image-tool.sh -b java_image_name=azul/zulu-openjdk build ``` Although Apache Spark behavior is not changed because it's orthogonal to the underlying JDK, OS and Python versions, yes, the underlying OS becomes `Ubuntu 26.04` and Python becomes `3.14` and Java vendor becomes `Eclipse Temurin`. This is inevitable due to the lack of Python 3.11 support of `Zulu JDK`. **AFTER** ``` $ docker run -it --rm eclipse-temurin:25-jre bash rootde506cc01b8a:/# cat /etc/os-release | grep VERSION_ID VERSION_ID="26.04" rootde506cc01b8a:/# apt-get update rootde506cc01b8a:/# apt-get install -y python3 rootde506cc01b8a:/# python3 --version Python 3.14.4 ``` Pass the CIs. Generated-by: Claude Code Closes #56607 from dongjoon-hyun/SPARK-57546. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 880083f) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR drops Python 3.10 support at
Apache Spark 4.3.0.For the record, we have been moved to
Python 3.11mainly since Apache Spark 4.0.0.Why are the changes needed?
Python 3.10 reaches end-of-life in October 2026 before Apache Spark 4.3.0 release. Dropping it on
master/branch-4.xlets PySpark 4.3+ rely on Python 3.11+ language features.Does this PR introduce any user-facing change?
Yes. Installing PySpark from
masteron Python 3.10 will fail withRequires-Python: >=3.11. Released versions are unaffected.How was this patch tested?
Existing CI (
build_python_3.11.ymlthroughbuild_python_3.14*.yml,build_python_minimum.yml,build_python_ps_minimum.yml).Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)