Skip to content

Update Spark Docker Images publish workflow#458

Closed
Yikun wants to merge 1 commit into
apache:asf-sitefrom
Yikun:docker-publish
Closed

Update Spark Docker Images publish workflow#458
Yikun wants to merge 1 commit into
apache:asf-sitefrom
Yikun:docker-publish

Conversation

@Yikun

@Yikun Yikun commented Apr 18, 2023

Copy link
Copy Markdown
Member

This PR try to update Create and upload Spark Docker Images to use new workflow, the new workflow will first build docker image, and then test (k8s / standalone) the image, finally publish the docker image, see example in here.

The different between old workflow and new workflow:

workflow Tag
Previous apache/spark:v3.4.0, apache/spark-py:v3.4.0, apache/spark-r:v3.4.0
New image

After
image

We already configure the DOCKER_USER and DOCKER_TOKEN in https://issues.apache.org/jira/browse/INFRA-23882 .

After this patch merged, the 3.4.0 image can be published as first docker image to apache/spark dockerhub (new version). In future release, only new version docker images will be published.

@Yikun Yikun marked this pull request as ready for review April 18, 2023 09:50
@HyukjinKwon

HyukjinKwon commented Apr 18, 2023

Copy link
Copy Markdown
Member

Merged to asf-site branch.

@Yikun

Yikun commented Apr 19, 2023

Copy link
Copy Markdown
Member Author

@HyukjinKwon @zhengruifeng @yaooqinn Thanks all! If no more comments, I will publish the docker image with new tags for 3.4.0 version later today!

@Yikun

Yikun commented Apr 19, 2023

Copy link
Copy Markdown
Member Author

I hitted a issue when I try to push the apache/spark image using github action, seems dockerhub token is invalid, already raise a issue on INFRA-24476 (seems a similar issue with INFRA-24459)

@Yikun

Yikun commented Apr 23, 2023

Copy link
Copy Markdown
Member Author

The INFRA-24476 had been resolved, and docker image already published at https://hub.docker.com/r/apache/spark/tags

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@dongjoon-hyun

Copy link
Copy Markdown
Member

Hi, @Yikun , @HyukjinKwon , @yaooqinn , @zhengruifeng .

it seems that this PR introduced two user-facing breaking changes

  • Downgrade Java from 17 to 11
$ docker run -it --rm apache/spark:v3.4.0 java -version | tail -n1
OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, sharing)
$ docker run -it --rm apache/spark:3.4.0 java -version | tail -n1
OpenJDK 64-Bit Server VM Temurin-11.0.18+10 (build 11.0.18+10, mixed mode, sharing)
  • Drop v prefix, v3.4.0 -> 3.4.0

If these breaking changes are intentional, I'm wondering if we discuss the rationals in the dev mailing list. Otherwise, I'd like to recommend to revert those breaking changes to avoid any confusion.

@dongjoon-hyun

Copy link
Copy Markdown
Member

cc @sunchao , too

@Yikun

Yikun commented Apr 25, 2023

Copy link
Copy Markdown
Member Author

Thanks for catch and reminder it! @dongjoon-hyun

For java17, I think we can switch to java17 in docker image.

For tag, it's intentional. In 3.4.0, v3.4.0 tag and 3.4.0 serious tags will co-exsiting.

It should be a new feature but not breaking changes. (But for latest tag where to point is a breaking changes).

So do you have any suggestion how we could migrate to new image fluently?

Maybe

  1. 3.4.0 keep previous and new tags, latest point to previous tag.
  2. 3.5.0 keep previous and new tags, and mark previous tag as deprecated, latest point to new tag.

@dongjoon-hyun

Copy link
Copy Markdown
Member

It's already Java 17 by default.

For java17, I think we can switch to java17 in docker image.

@Yikun

Yikun commented Apr 25, 2023

Copy link
Copy Markdown
Member Author

Yes, I meant we still need to update in apache/spark-docker.

@dongjoon-hyun

Copy link
Copy Markdown
Member
  1. If we are going to maintain the manual process, why do you remove old description in this PR?

For tag, it's intentional. In 3.4.0, v3.4.0 tag and 3.4.0 serious tags will co-exsiting.

  1. To be clear, I really appreciate your contribution because this is good enough to mitigate Apache Spark release managers' burden.

Given (1) and (2), I don't think this PR claims to maintain an existing manual publish process. In other words, this PR already burn the bridge.

I believe 3.4.0 and v3.4.0 only causes many confusions to the users. In Apache Spark community, that's the reason why I propose to keep v3.4.0 tag format only (whatever we used; manual or GitHub)

@Yikun

Yikun commented Apr 26, 2023

Copy link
Copy Markdown
Member Author

I believe 3.4.0 and v3.4.0 only causes many confusions to the users. In Apache Spark community, that's the reason why I propose to keep v3.4.0 tag format only (whatever we used; manual or GitHub)

The reason of we remove v is to follow DOI rules. If we still think v should be keep in future, we can both keep them (to point new images).

Given (1) and (2), I don't think this PR claims to maintain an existing manual publish process. In other words, this PR already burn the bridge.

At this point, below might the solutions to continue the work:

Solution 1:

  1. Switch Java 17 to default in spark-docker: [SPARK-40513] Switch 3.4.0 default Java to Java17 spark-docker#35
  2. Keep v3.4.0 and 3.4.0 to point GA publish image.
  3. Keep this PR to instead of manual process

Solution 2:

  1. Recover latest to manual publish image for 3.4.0 images.
  2. Revert this PR (Recover the manual process and also keep the GA publish steps in this PR's doc)
  3. v3.4.0 point to manual image, 3.4.0 point to GA publish image
  4. Keep manual image and GA publish image in 3.4.0 series, and unifed all images into GA publish image to in 3.5.0

@dongjoon-hyun @HyukjinKwon @zhengruifeng WDYT?

@dongjoon-hyun

dongjoon-hyun commented Apr 26, 2023

Copy link
Copy Markdown
Member

@Yikun . We cannot simply remove anything like this PR. The official way to adopt new standard is to send the email on dev mailing list and build a community consensus. In other words, you need to inform and persuade.

The reason of we remove v is to follow DOI rules. If we still think v should be keep in future, we can both keep them (to point new images).

As you know, Apache Spark dev mailing list is the only official way to build a community consensus. All the other channels (GitHub or even user mailing list). Please shoot the email. :)

@Yikun

Yikun commented Apr 26, 2023

Copy link
Copy Markdown
Member Author

As you know, Apache Spark dev mailing list is the only official way to build a community consensus. All the other channels (GitHub or even user mailing list). Please shoot the email. :)

Sure! I will send the mail in this week!

@yaooqinn

Copy link
Copy Markdown
Member

The reason of we remove v is to follow DOI rules.

Probably these are guidelines but not hard and fast rules. And I didn't see the prefix v is not recommended.

@Yikun

Yikun commented Apr 26, 2023

Copy link
Copy Markdown
Member Author

@yaooqinn I also raise a issue on DOI as our discussion input: docker-library/official-images#14506 .

@HyukjinKwon

Copy link
Copy Markdown
Member

Yeah, let's discuss in the dev mailing list, and set that down.

I personally think it's fine to make some breaking changes at this moment, but the more important thing is that we should set that down after the discussion so we don't make breaking changes in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants