Skip to content

[SPARK-56889][PYTHON][INFRA] Drop Python 3.10 Support#55914

Closed
dongjoon-hyun wants to merge 1 commit into
apache:masterfrom
dongjoon-hyun:SPARK-56889
Closed

[SPARK-56889][PYTHON][INFRA] Drop Python 3.10 Support#55914
dongjoon-hyun wants to merge 1 commit into
apache:masterfrom
dongjoon-hyun:SPARK-56889

Conversation

@dongjoon-hyun

@dongjoon-hyun dongjoon-hyun commented May 15, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR drops Python 3.10 support at Apache Spark 4.3.0.

For the record, we have been moved to Python 3.11 mainly since Apache Spark 4.0.0.

Why are the changes needed?

Python 3.10 reaches end-of-life in October 2026 before Apache Spark 4.3.0 release. Dropping it on master/branch-4.x lets PySpark 4.3+ rely on Python 3.11+ language features.

Does this PR introduce any user-facing change?

Yes. Installing PySpark from master on Python 3.10 will fail with Requires-Python: >=3.11. Released versions are unaffected.

How was this patch tested?

Existing CI (build_python_3.11.yml through build_python_3.14*.yml, build_python_minimum.yml, build_python_ps_minimum.yml).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

@dongjoon-hyun

Copy link
Copy Markdown
Member Author

Could you review this PR, @LuciferYang ? This is for Apache Spark 4.3.

@LuciferYang LuciferYang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@dongjoon-hyun

Copy link
Copy Markdown
Member Author

Thank you, @LuciferYang ~

Merged to master/4.x.

dongjoon-hyun added a commit that referenced this pull request May 16, 2026
This PR drops Python 3.10 support at `Apache Spark 4.3.0`.

For the record, we have been moved to `Python 3.11` mainly since Apache Spark 4.0.0.
- https://github.com/apache/spark/actions/workflows/build_branch40_python.yml (Python 3.11)

Python 3.10 reaches end-of-life in October 2026 before Apache Spark 4.3.0 release. Dropping it on `master/branch-4.x` lets PySpark 4.3+ rely on Python 3.11+ language features.

Yes. Installing PySpark from `master` on Python 3.10 will fail with `Requires-Python: >=3.11`. Released versions are unaffected.

Existing CI (`build_python_3.11.yml` through `build_python_3.14*.yml`, `build_python_minimum.yml`, `build_python_ps_minimum.yml`).

Generated-by: Claude Code (Opus 4.7)

Closes #55914 from dongjoon-hyun/SPARK-56889.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 0a0d31b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-56889 branch May 16, 2026 17:54
Comment thread dev/infra/Dockerfile
pandoc \
pkg-config \
python3.10 \
python3.11 \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR failed image cahcing

https://github.com/apache/spark/actions/runs/26010261434/job/76449371588

and IIRC the 'dev/infra/Dockerfile' is only used for branch-3.5, we don't need to touch it in master

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reporting. I made an alternative follow-up.

dongjoon-hyun pushed a commit that referenced this pull request May 18, 2026
…file

### What changes were proposed in this pull request?

This is a partial revert of #55914 (SPARK-56889) restricted to `dev/infra/Dockerfile`. It restores the file to its state at `0a0d31bea00~1` — switching the system Python used in the base CI image from `python3.11` back to `python3.10`. All other 20 files changed by #55914 are kept as-is.

### Why are the changes needed?

The `Build / Cache base image` workflow has been failing on every branch since #55914 was merged (2026-05-16): https://github.com/apache/spark/actions/runs/26010261434/job/76449371588

The failure aborts at the first `Build and push` step (the base `./dev/infra/` image), with:

```
RUN add-apt-repository ppa:deadsnakes/ppa
  ...
  ModuleNotFoundError: No module named 'pyparsing'
ERROR: process "/bin/sh -c add-apt-repository ppa:deadsnakes/ppa" did not complete successfully: exit code: 1
```

This breaks the cache build for all downstream image jobs, since the base layer is shared. Restoring the previous Dockerfile is the smallest change that unblocks the cache workflow while a forward fix is investigated.

### Does this PR introduce _any_ user-facing change?

No. CI-only change. PySpark itself is unaffected — only the system Python inside the test image.

### How was this patch tested?

This PR re-runs `Build / Cache base image` against the restored Dockerfile.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (model: claude-opus-4-7)

Closes #55946 from zhengruifeng/restore-infra-dockerfile-python310.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request May 18, 2026
…file

### What changes were proposed in this pull request?

This is a partial revert of #55914 (SPARK-56889) restricted to `dev/infra/Dockerfile`. It restores the file to its state at `0a0d31bea00~1` — switching the system Python used in the base CI image from `python3.11` back to `python3.10`. All other 20 files changed by #55914 are kept as-is.

### Why are the changes needed?

The `Build / Cache base image` workflow has been failing on every branch since #55914 was merged (2026-05-16): https://github.com/apache/spark/actions/runs/26010261434/job/76449371588

The failure aborts at the first `Build and push` step (the base `./dev/infra/` image), with:

```
RUN add-apt-repository ppa:deadsnakes/ppa
  ...
  ModuleNotFoundError: No module named 'pyparsing'
ERROR: process "/bin/sh -c add-apt-repository ppa:deadsnakes/ppa" did not complete successfully: exit code: 1
```

This breaks the cache build for all downstream image jobs, since the base layer is shared. Restoring the previous Dockerfile is the smallest change that unblocks the cache workflow while a forward fix is investigated.

### Does this PR introduce _any_ user-facing change?

No. CI-only change. PySpark itself is unaffected — only the system Python inside the test image.

### How was this patch tested?

This PR re-runs `Build / Cache base image` against the restored Dockerfile.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (model: claude-opus-4-7)

Closes #55946 from zhengruifeng/restore-infra-dockerfile-python310.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 3608538)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Jun 19, 2026
…ython 3.14` by default

### What changes were proposed in this pull request?

Like `Apache Spark Docker` repository, this PR aims to use `eclipse-temurin` to use `Ubuntu 26.04` and `Python 3.14` by default for Apache Spark 4.3.0.

https://github.com/apache/spark-docker/blob/75f9e755807c67794776276d8f5e3e2ceedfa82b/4.1.2/scala2.13-java17-ubuntu/Dockerfile#L17

### Why are the changes needed?

Apache Spark 4.3.0 dropped Python 3.10.
- #55914

However, `azul/zulu-openjdk` only supports `Ubuntu 22.04` which doesn't support Python 3.11 officially. It only have `Python 3.11.0rc1`.

```
$ docker run -it --rm azul/zulu-openjdk:25 bash
root3c66cb6be8be:/# cat /etc/os-release | grep VERSION_ID
VERSION_ID="22.04"
root3c66cb6be8be:/# apt-get update
root3c66cb6be8be:/# apt-get install -y python3.11
root3c66cb6be8be:/# python3.11 --version
Python 3.11.0rc1
```

### Does this PR introduce _any_ user-facing change?

Note that previously we support this new combination by the following.
```
$ bin/docker-image-tool.sh -b java_image_name=eclipse-temurin build
```

A user can switch back manually in the same way.
```
$ bin/docker-image-tool.sh -b java_image_name=azul/zulu-openjdk build
```

Although Apache Spark behavior is not changed because it's orthogonal to the underlying JDK, OS and Python versions, yes, the underlying OS becomes `Ubuntu 26.04` and Python becomes `3.14` and Java vendor becomes `Eclipse Temurin`. This is inevitable due to the lack of Python 3.11 support of `Zulu JDK`.

**AFTER**
```
$ docker run -it --rm eclipse-temurin:25-jre bash
rootde506cc01b8a:/# cat /etc/os-release | grep VERSION_ID
VERSION_ID="26.04"
rootde506cc01b8a:/# apt-get update
rootde506cc01b8a:/# apt-get install -y python3
rootde506cc01b8a:/# python3 --version
Python 3.14.4
```

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

Closes #56607 from dongjoon-hyun/SPARK-57546.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Jun 19, 2026
…ython 3.14` by default

Like `Apache Spark Docker` repository, this PR aims to use `eclipse-temurin` to use `Ubuntu 26.04` and `Python 3.14` by default for Apache Spark 4.3.0.

https://github.com/apache/spark-docker/blob/75f9e755807c67794776276d8f5e3e2ceedfa82b/4.1.2/scala2.13-java17-ubuntu/Dockerfile#L17

Apache Spark 4.3.0 dropped Python 3.10.
- #55914

However, `azul/zulu-openjdk` only supports `Ubuntu 22.04` which doesn't support Python 3.11 officially. It only have `Python 3.11.0rc1`.

```
$ docker run -it --rm azul/zulu-openjdk:25 bash
root3c66cb6be8be:/# cat /etc/os-release | grep VERSION_ID
VERSION_ID="22.04"
root3c66cb6be8be:/# apt-get update
root3c66cb6be8be:/# apt-get install -y python3.11
root3c66cb6be8be:/# python3.11 --version
Python 3.11.0rc1
```

Note that previously we support this new combination by the following.
```
$ bin/docker-image-tool.sh -b java_image_name=eclipse-temurin build
```

A user can switch back manually in the same way.
```
$ bin/docker-image-tool.sh -b java_image_name=azul/zulu-openjdk build
```

Although Apache Spark behavior is not changed because it's orthogonal to the underlying JDK, OS and Python versions, yes, the underlying OS becomes `Ubuntu 26.04` and Python becomes `3.14` and Java vendor becomes `Eclipse Temurin`. This is inevitable due to the lack of Python 3.11 support of `Zulu JDK`.

**AFTER**
```
$ docker run -it --rm eclipse-temurin:25-jre bash
rootde506cc01b8a:/# cat /etc/os-release | grep VERSION_ID
VERSION_ID="26.04"
rootde506cc01b8a:/# apt-get update
rootde506cc01b8a:/# apt-get install -y python3
rootde506cc01b8a:/# python3 --version
Python 3.14.4
```

Pass the CIs.

Generated-by: Claude Code

Closes #56607 from dongjoon-hyun/SPARK-57546.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 880083f)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants