Skip to content

Upgrade to modern spark (3.4)#166

Merged
colindean merged 16 commits into
target:masterfrom
holdenk:upgrade-to-modern-spark
Apr 10, 2024
Merged

Upgrade to modern spark (3.4)#166
colindean merged 16 commits into
target:masterfrom
holdenk:upgrade-to-modern-spark

Conversation

@holdenk

@holdenk holdenk commented Aug 29, 2023

Copy link
Copy Markdown
Contributor

Hey y'all poked around in the project, I don't know if this is still being actively developed but if folks are interested upgraded it to Spark 3.4, I've only done the tests with the built in test suite and it's my first time poking around in the project so you know could be some mistakes.

@holdenk holdenk requested review from a team and c-horn as code owners August 29, 2023 07:06
@holdenk holdenk requested review from abs428 and colindean August 29, 2023 07:06
@colindean

Copy link
Copy Markdown
Collaborator

I really appreciate the contribution — first outside contribution in some time! For our purposes, DV must continue working with Spark 2.3 and JDK8, so we can't bump versions as you have proposed.

Can you think of a way to test/qualify this against both Spark 2.3 and 3.4? As a potential starting point, I started on crossbuilding last year but haven't had the bandwidth to complete the work.

@holdenk

holdenk commented Aug 29, 2023

Copy link
Copy Markdown
Contributor Author

So one lazy option would be to make a spark 3 and a spark 2 branch and publish a "-3" version of the artifact rather than a "proper" cross build.

@c-horn

c-horn commented Aug 29, 2023

Copy link
Copy Markdown
Collaborator

I could potentially take a look at finishing the cross-build work @colindean
Are there still users of this internally at Target who can't switch to Spark 3.X + docker?

Also many thanks @holdenk for keeping this project / others in the spark ecosystem in mind.

@holdenk

holdenk commented Aug 30, 2023

Copy link
Copy Markdown
Contributor Author

So I was thinking the main "annoying" thing in a cross build for this is probably the tests and maybe the quoted identifiers. We could probably make it cross build easily, although my default is lazy (branches).

@colindean

Copy link
Copy Markdown
Collaborator

So one lazy option would be to make a spark 3 and a spark 2 branch and publish a "-3" version of the artifact rather than a "proper" cross build.

That's a good idea, but it introduces some operational complexity to be recalled at release time. I prefer the technical complexity of an sbt crossbuild, especially if…

I could potentially take a look at finishing the cross-build work @colindean

So I was thinking the main "annoying" thing in a cross build for this is probably the tests and maybe the quoted identifiers. We could probably make it cross build easily, although my default is lazy (branches).

❤️ Yes that would be awesome. I don't remember what was going awry when I left off with it. I welcome the contribution, including pairing if either of you'd like (or you could pair the two of you if you'd like).

Are there still users of this internally at Target who can't switch to Spark 3.X + docker?

I've not seen pressure from DV users to support Spark 3, but most DV users who have identified themselves are internal to Target and stuck on our Spark 2.3 cluster while we await the production launch of our internal platform enabling use of Spark 3 + Docker.

Also many thanks @holdenk for keeping this project / others in the spark ecosystem in mind.

Yes, indeed! I'm very happy to see this and your continued work in the space, and to see another former IBMer around!

@holdenk

holdenk commented Aug 31, 2023

Copy link
Copy Markdown
Contributor Author

So my other project where I do more extensive cross-building is https://github.com/holdenk/spark-testing-base/blob/main/build.sbt , it puts the actual Spark Versions outside of sbt (passed in as a parameter) and does "classic" sbt cross-build for the scala versions. What do folks think of something like that?

Also thanks for being so friendly y'all :)

@holdenk

holdenk commented Aug 31, 2023

Copy link
Copy Markdown
Contributor Author

Also, slightly off topic, (in the context of having a large pre-Spark 3 codebase) my self and some of my coworkers have been working on some tooling at https://github.com/holdenk/spark-upgrade that might be useful once you'all have Spark 3 ready for use :)

@holdenk

holdenk commented Sep 3, 2023

Copy link
Copy Markdown
Contributor Author

A bit of work to see if folks agree with the direction, if so I'll update the tests and such :)

@holdenk

holdenk commented Sep 11, 2023

Copy link
Copy Markdown
Contributor Author

Hey just following up @c-horn do you have some bandwidth to review this?

@c-horn

c-horn commented Sep 12, 2023

Copy link
Copy Markdown
Collaborator

Hi, sorry, my free time plans got ahead of me, I will be out of the country for a few weeks starting weds. I will try to take a look at this tonight or tomorrow.

Comment thread build.sbt
@colindean

Copy link
Copy Markdown
Collaborator

It's been a while but I'm hoping to revisit this mid-February. I shook out some backed-up update PRs and will take a close look at this when I return from FOSDEM — @holdenk, I noted that you're speaking there and I hope to attend your talk and meet you!

@holdenk

holdenk commented Jan 30, 2024

Copy link
Copy Markdown
Contributor Author

Sounds awesome, maybe we can grab some coffee:)

@colindean

Copy link
Copy Markdown
Collaborator

Good news! The requirement to support Spark 2 is now gone as my org is moving to Spark 3 by the end of May. I'm going to return to this in the coming days/weeks as I've gotten a few requests for it.

@colindean

Copy link
Copy Markdown
Collaborator

We're going into Spark 3 at v3.5.1 and still supporting Scala 2.12 and 2.13.

Comment thread .github/workflows/ci.yaml Outdated
Comment thread .github/workflows/release.yaml Outdated
Comment thread README.md Outdated
@colindean

Copy link
Copy Markdown
Collaborator

#186 sets some new baseline. I think I want to get this merged (probably with the suggestions to move to Spark 3.5.1 merged) and then continue from there.

@jaspal1404

jaspal1404 commented Apr 10, 2024

Copy link
Copy Markdown

@colindean @holdenk - Here is the change that is required in build.sbt to make the DV work with Spark version 3.x. I tested it and its working:

current circe-yaml version : 0.14.2
new circe-yaml version : 1.15.0

Change line number 35 in build.sbt from 0.14.2 to 1.15.0

Another thing I changed was line number 6 as below (could not be needed, wasn't sure how we are getting the spark version from the env)

sparkVersion := "3.5.1"

Comment thread build.sbt Outdated
Comment thread build.sbt Outdated
Comment thread build.sbt Outdated
Comment thread .github/workflows/ci.yaml Outdated
Comment thread build.sbt Outdated
There was no 0.14.6 for circe-yaml
@colindean colindean enabled auto-merge April 10, 2024 21:40
@colindean colindean merged commit b65ec2a into target:master Apr 10, 2024
@colindean

Copy link
Copy Markdown
Collaborator

Thank you so much, @holdenk, for submitting this. I'm sorry it took a while for priorities to bubble up. I'm glad that we were able to merge it eventually.

@jaspal1404 jaspal1404 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants