[SPARK-42539][SQL][HIVE] Eliminate separate classloader when using 'builtin' Hive version for metadata client by xkrogen · Pull Request #40144 · apache/spark

xkrogen · 2023-02-23T18:03:14Z

What changes were proposed in this pull request?

When using the 'builtin' Hive version for the Hive metadata client, do not create a separate classloader, and rather continue to use the overall user/application classloader (regardless of Java version). This standardizes the behavior for all Java versions with that of Java 9+. See SPARK-42539 for more details on why this approach was chosen.

Why are the changes needed?

Please see a much more detailed description in SPARK-42539. The tl;dr is that user-provided JARs (such as hive-exec-2.3.8.jar) take precedence over Spark/system JARs when constructing the classloader used by IsolatedClientLoader on Java 8 in 'builtin' mode, which can cause unexpected behavior and/or breakages. This violates the expectation that, unless user-first classloader mode is used, Spark JARs should be prioritized over user JARs. It also seems that this separate classloader was unnecessary from the start, since the intent of 'builtin' mode is to use the JARs already existing on the regular classloader (as alluded to here). The isolated clientloader was originally added in #5876 in 2015. This bit in the PR description is the only mention of the behavior for "builtin":

attempt to discover the jars that were used to load Spark SQL and use those. This option is only valid when using the execution version of Hive.

I can't follow the logic here; the user classloader clearly has all of the necessary Hive JARs, since that's where we're getting the JAR URLs from, so we could just use that directly instead of grabbing the URLs. When this was initially added, it only used the JARs from the user classloader, not any of its parents, which I suspect was the motivating factor (to try to avoid more Spark classes being duplicated inside of the isolated classloader, I guess). But that was changed a month later anyway in #6435 / #6459, so I think this may have basically been deadcode from the start. It has also caused at least one issue over the years, e.g. SPARK-21428, which disables the new-classloader behavior in the case of running inside of a CLI session.

Does this PR introduce any user-facing change?

No, except to protect Spark itself from potentially being broken by bad user JARs.

How was this patch tested?

This includes a new unit test in HiveUtilsSuite which demonstrates the issue and shows that this approach resolves it. It has also been tested on a live cluster running Java 8 and Hive communication functionality continues to work as expected.

…e general spark user classloader when 'builtin' is used instead of reconstructing a new URL classloader

xkrogen · 2023-02-23T18:19:49Z

cc @sunchao @AngersZhuuuu since you've worked on somewhat-related changes in #34690, #32887, etc.
cc @srowen @squito since you were involved in #24057 for the Java 9+ changes
cc @HyukjinKwon @dongjoon-hyun for any general interest

dongjoon-hyun · 2023-02-23T18:37:07Z

Also, cc @mridulm , @cloud-fan , @rednaxelafx, @zsxwing, @kiszk , @maropu .

cloud-fan · 2023-02-24T02:37:38Z

It makes sense to use the builtin classloader when using builtin Hive. To clarify: we still have the class loading issue if people specifies a certain hive version (not builtin), right?

xkrogen · 2023-02-24T16:50:58Z

Great question @cloud-fan , and actually no, we don't. For all of the other values of spark.sql.hive.metastore.jars besides 'builtin', the user JARs are not included at all (refer to this section of HiveUtils). In all of those cases, the JAR list is constructed purely from the dependencies for the specified Hive version. Whether that behavior is correct is another question -- @shardulm94 informed me that user JARs are required to support custom serdes inside of the Hive client -- but in any case, 'builtin' is the only mode that is susceptible to this issue.

cloud-fan · 2023-02-27T08:54:38Z

lgtm if all tests pass

sunchao

LGTM too

…uiltin' Hive version for metadata client ### What changes were proposed in this pull request? When using the 'builtin' Hive version for the Hive metadata client, do not create a separate classloader, and rather continue to use the overall user/application classloader (regardless of Java version). This standardizes the behavior for all Java versions with that of Java 9+. See SPARK-42539 for more details on why this approach was chosen. ### Why are the changes needed? Please see a much more detailed description in SPARK-42539. The tl;dr is that user-provided JARs (such as `hive-exec-2.3.8.jar`) take precedence over Spark/system JARs when constructing the classloader used by `IsolatedClientLoader` on Java 8 in 'builtin' mode, which can cause unexpected behavior and/or breakages. This violates the expectation that, unless user-first classloader mode is used, Spark JARs should be prioritized over user JARs. It also seems that this separate classloader was unnecessary from the start, since the intent of 'builtin' mode is to use the JARs already existing on the regular classloader (as alluded to [here](#24057 (comment))). The isolated clientloader was originally added in #5876 in 2015. This bit in the PR description is the only mention of the behavior for "builtin": > attempt to discover the jars that were used to load Spark SQL and use those. This option is only valid when using the execution version of Hive. I can't follow the logic here; the user classloader clearly has all of the necessary Hive JARs, since that's where we're getting the JAR URLs from, so we could just use that directly instead of grabbing the URLs. When this was initially added, it only used the JARs from the user classloader, not any of its parents, which I suspect was the motivating factor (to try to avoid more Spark classes being duplicated inside of the isolated classloader, I guess). But that was changed a month later anyway in #6435 / #6459, so I think this may have basically been deadcode from the start. It has also caused at least one issue over the years, e.g. SPARK-21428, which disables the new-classloader behavior in the case of running inside of a CLI session. ### Does this PR introduce _any_ user-facing change? No, except to protect Spark itself from potentially being broken by bad user JARs. ### How was this patch tested? This includes a new unit test in `HiveUtilsSuite` which demonstrates the issue and shows that this approach resolves it. It has also been tested on a live cluster running Java 8 and Hive communication functionality continues to work as expected. Closes #40144 from xkrogen/xkrogen/SPARK-42539/hive-isolatedclientloader-builtin-user-jar-conflict-fix/java9strategy. Authored-by: Erik Krogen <xkrogen@apache.org> Signed-off-by: Chao Sun <sunchao@apple.com>

sunchao · 2023-02-27T22:59:11Z

Merged to master/branch-3.4. Thanks @xkrogen !

xkrogen · 2023-02-27T23:19:04Z

Thanks @sunchao and @cloud-fan !

HyukjinKwon · 2023-02-28T04:04:32Z

Seems like the tests didn't pass .. I am reverting this as it causes a lot of test failures. e.g.)

dongjoon-hyun · 2023-02-28T04:40:05Z

Thank you for recovering master branch by reverting, @HyukjinKwon ! The reverting unblocks other PRs.

cloud-fan · 2023-02-28T04:41:38Z

Shall we revert it from 3.4 as well?

dongjoon-hyun · 2023-02-28T04:45:31Z

Yes, it was reverted here, 26009d4.

sunchao · 2023-02-28T04:58:02Z

Hmm interesting. Somehow the tests were shown all passing for me when I merged this. Sorry for the trouble.

xkrogen · 2023-02-28T15:51:18Z

I will take a look at the test failures, thanks @HyukjinKwon for addressing the revert!

dongjoon-hyun · 2023-02-28T17:52:38Z

If you folks don't mind, shall we consider this for Apache Spark 3.5 only? ClassLoader issue has been tricky always in the community. We need enough time to stabilize in master branch to give a chance to be verified in several cases by different organizations.

sunchao · 2023-02-28T18:10:47Z

+1. I agree with @dongjoon-hyun .

xkrogen · 2023-02-28T21:46:48Z

I agree as well. Posted a new PR at #40224.

…uiltin' Hive version for metadata client ### What changes were proposed in this pull request? When using the 'builtin' Hive version for the Hive metadata client, do not create a separate classloader, and rather continue to use the overall user/application classloader (regardless of Java version). This standardizes the behavior for all Java versions with that of Java 9+. See SPARK-42539 for more details on why this approach was chosen. Please note that this is a re-submit of #40144. That one introduced test failures, and potentially a real issue, because the PR works by setting `isolationOn = false` for `builtin` mode. In addition to adjusting the classloader, `HiveClientImpl` relies on `isolationOn` to determine if it should use an isolated copy of `SessionState`, so the PR inadvertently switched to using a shared `SessionState` object. I think we do want to continue to have the isolated session state even in `builtin` mode, so this adds a new flag `sessionStateIsolationOn` which controls whether the session state should be isolated, _separately_ from the `isolationOn` flag which controls whether the classloader should be isolated. Default behavior is for `sessionStateIsolationOn` to be set equal to `isolationOn`, but for `builtin` mode, we override it to enable session state isolated even though classloader isolation is turned off. ### Why are the changes needed? Please see a much more detailed description in SPARK-42539. The tl;dr is that user-provided JARs (such as `hive-exec-2.3.8.jar`) take precedence over Spark/system JARs when constructing the classloader used by `IsolatedClientLoader` on Java 8 in 'builtin' mode, which can cause unexpected behavior and/or breakages. This violates the expectation that, unless user-first classloader mode is used, Spark JARs should be prioritized over user JARs. It also seems that this separate classloader was unnecessary from the start, since the intent of 'builtin' mode is to use the JARs already existing on the regular classloader (as alluded to [here](#24057 (comment))). The isolated clientloader was originally added in #5876 in 2015. This bit in the PR description is the only mention of the behavior for "builtin": > attempt to discover the jars that were used to load Spark SQL and use those. This option is only valid when using the execution version of Hive. I can't follow the logic here; the user classloader clearly has all of the necessary Hive JARs, since that's where we're getting the JAR URLs from, so we could just use that directly instead of grabbing the URLs. When this was initially added, it only used the JARs from the user classloader, not any of its parents, which I suspect was the motivating factor (to try to avoid more Spark classes being duplicated inside of the isolated classloader, I guess). But that was changed a month later anyway in #6435 / #6459, so I think this may have basically been deadcode from the start. It has also caused at least one issue over the years, e.g. SPARK-21428, which disables the new-classloader behavior in the case of running inside of a CLI session. ### Does this PR introduce _any_ user-facing change? No, except to protect Spark itself from potentially being broken by bad user JARs. ### How was this patch tested? This includes a new unit test in `HiveUtilsSuite` which demonstrates the issue and shows that this approach resolves it. It has also been tested on a live cluster running Java 8 and Hive communication functionality continues to work as expected. Unit tests failing in #40144 have been locally tested (`HiveUtilsSuite`, `HiveSharedStateSuite`, `HiveCliSessionStateSuite`, `JsonHadoopFsRelationSuite`). Closes #40224 from xkrogen/xkrogen/SPARK-42539/hive-isolatedclientloader-builtin-user-jar-conflict-fix/take2. Authored-by: Erik Krogen <xkrogen@apache.org> Signed-off-by: Chao Sun <sunchao@apple.com>

xkrogen added 2 commits February 22, 2023 16:40

Add a test in HiveUtilsSuite reproducing the issue

b225072

eliminate the conflict using the same approach as Java 9; just use th…

dc42da8

…e general spark user classloader when 'builtin' is used instead of reconstructing a new URL classloader

github-actions Bot added CORE SQL labels Feb 23, 2023

xkrogen added 2 commits February 23, 2023 10:04

fix up some comments

686d939

simplify / clean up unnecessary code

e596598

xkrogen changed the title ~~[SPARK-42539][SQL][HIVE] Elminiate separate classloader when using 'builtin' Hive version for metadata client~~ [SPARK-42539][SQL][HIVE] Eliminate separate classloader when using 'builtin' Hive version for metadata client Feb 24, 2023

cloud-fan approved these changes Feb 27, 2023

View reviewed changes

sunchao approved these changes Feb 27, 2023

View reviewed changes

Comment thread sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala Outdated

Comment thread sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala Outdated

minor style fixups

a5f5d9e

sunchao closed this in 27ad583 Feb 27, 2023

xkrogen deleted the xkrogen/SPARK-42539/hive-isolatedclientloader-builtin-user-jar-conflict-fix/java9strategy branch February 27, 2023 23:04

xkrogen mentioned this pull request Feb 28, 2023

[SPARK-42539][SQL][HIVE] Eliminate separate classloader when using 'builtin' Hive version for metadata client #40224

Closed

Uh oh!

Conversation

xkrogen commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

xkrogen commented Feb 23, 2023

Uh oh!

dongjoon-hyun commented Feb 23, 2023

Uh oh!

cloud-fan commented Feb 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xkrogen commented Feb 24, 2023

Uh oh!

cloud-fan commented Feb 27, 2023

Uh oh!

sunchao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sunchao commented Feb 27, 2023

Uh oh!

xkrogen commented Feb 27, 2023

Uh oh!

HyukjinKwon commented Feb 28, 2023

Uh oh!

dongjoon-hyun commented Feb 28, 2023

Uh oh!

cloud-fan commented Feb 28, 2023

Uh oh!

dongjoon-hyun commented Feb 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunchao commented Feb 28, 2023

Uh oh!

xkrogen commented Feb 28, 2023

Uh oh!

dongjoon-hyun commented Feb 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunchao commented Feb 28, 2023

Uh oh!

xkrogen commented Feb 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xkrogen commented Feb 23, 2023 •

edited

Loading

cloud-fan commented Feb 24, 2023 •

edited

Loading

dongjoon-hyun commented Feb 28, 2023 •

edited

Loading

dongjoon-hyun commented Feb 28, 2023 •

edited

Loading