[SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK11#22953
Closed
dbtsai wants to merge 1 commit into
Closed
Conversation
Member
Author
srowen
approved these changes
Nov 6, 2018
srowen
left a comment
Member
There was a problem hiding this comment.
I am guessing this is also needed to support Java 9? either way, yeah, seems good for Spark 3.
Member
Author
|
ASM6 supports Java 9 while ASM7 supports Java 9, Java 10, and Java 11. Thanks. |
Member
|
Looks good to me. |
|
Test build #98496 has finished for PR 22953 at commit
|
Member
Author
|
Thanks. Merged into master. |
asfgit
pushed a commit
that referenced
this pull request
Nov 6, 2018
## What changes were proposed in this pull request? Upgrade ASM to 7.x to support JDK11 ## How was this patch tested? Existing tests. Closes #22953 from dbtsai/asm7. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>
jackylee-ch
pushed a commit
to jackylee-ch/spark
that referenced
this pull request
Feb 18, 2019
## What changes were proposed in this pull request? Upgrade ASM to 7.x to support JDK11 ## How was this patch tested? Existing tests. Closes apache#22953 from dbtsai/asm7. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>
senthh
added a commit
to acceldata-io/spark
that referenced
this pull request
Jun 25, 2026
* ODP-7038|[SPARK-25946][BUILD] Upgrade ASM to 7.x to support JDK11 ## What changes were proposed in this pull request? Upgrade ASM to 7.x to support JDK11 ## How was this patch tested? Existing tests. Closes apache#22953 from dbtsai/asm7. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com> (cherry picked from commit 3ed91c9) * ODP-7038 - Improvement - Enable Spark2 with jdk11 runtime support * ODP-7038 - Improvement - Enable Spark2 with jdk11 runtime support * ODP-7038: replace String.lines with split for JDK11 compile JDK11 added java.lang.String#lines() returning java.util.stream.Stream<String>. Scala 2.11's StringLike implicit also exposes .lines (Iterator[String]), but the Java instance method takes resolution priority on JDK11+. The resulting Stream<String>.toArray returns Object[], and the downstream .size / .forall(_.size <= N) then fail to typecheck: value size is not a member of Object MatricesSuite (both mllib and mllib-local copies) only needs a plain newline split, so use .split("\\n") which returns Array[String] unambiguously on every JDK. * ODP-7038|[SPARK-26839][SQL] Work around classloader changes in Java 9 for Hive isolation Note, this doesn't really resolve the JIRA, but makes the changes we can make so far that would be required to solve it. ## What changes were proposed in this pull request? Java 9+ changed how ClassLoaders work. The two most salient points: - The boot classloader no longer 'sees' the platform classes. A new 'platform classloader' does and should be the parent of new ClassLoaders - The system classloader is no longer a URLClassLoader, so we can't get the URLs of JARs in its classpath ## How was this patch tested? We'll see whether Java 8 tests still pass here. Java 11 tests do not fully pass at this point; more notes below. This does make progress on the failures though. (NB: to test with Java 11, you need to build with Java 8 first, setting JAVA_HOME and java's executable correctly, then switch both to Java 11 for testing.) Closes apache#24057 from srowen/SPARK-26839. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com> (cherry picked from commit c65f9b2) * ODP-7038|[SPARK-28723][SQL] Upgrade to Hive 2.3.6 for HiveMetastore Client and Hadoop-3.2 profile ### What changes were proposed in this pull request? This PR upgrade the built-in Hive to 2.3.6 for `hadoop-3.2`. Hive 2.3.6 release notes: - [HIVE-22096](https://issues.apache.org/jira/browse/HIVE-22096): Backport [HIVE-21584](https://issues.apache.org/jira/browse/HIVE-21584) (Java 11 preparation: system class loader is not URLClassLoader) - [HIVE-21859](https://issues.apache.org/jira/browse/HIVE-21859): Backport [HIVE-17466](https://issues.apache.org/jira/browse/HIVE-17466) (Metastore API to list unique partition-key-value combinations) - [HIVE-21786](https://issues.apache.org/jira/browse/HIVE-21786): Update repo URLs in poms branch 2.3 version ### Why are the changes needed? Make Spark support JDK 11. ### Does this PR introduce any user-facing change? Yes. Please see [SPARK-28684](https://issues.apache.org/jira/browse/SPARK-28684) and [SPARK-24417](https://issues.apache.org/jira/browse/SPARK-24417) for more details. ### How was this patch tested? Existing unit test and manual test. Closes apache#25443 from wangyum/test-on-jenkins. Lead-authored-by: Yuming Wang <yumwang@ebay.com> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 02a0cde) * ODP-7038 - Dev - Adding missing orc versions * ODP-7038: harden Platform.<clinit> Cleaner reflection for JDK11 runtime On JDK11, jdk.internal.ref is not exported to the unnamed module by default, so Method.setAccessible() throws InaccessibleObjectException inside Platform's static block, and spark-shell fails to start with: java.lang.ExceptionInInitializerError at ByteArrayMethods.<clinit> Caused by: InaccessibleObjectException: Unable to make ... jdk.internal.ref.Cleaner.create(Object, Runnable) accessible Backport the SPARK-26839 graceful-degradation pattern from upstream 2.4.x+/3.x: - Catch InaccessibleObjectException by name (avoids importing the JDK9+ class) when setAccessible() on DirectByteBuffer ctor/field fails; null both refs. - Probe createMethod by calling it with null args; if it throws IllegalAccessException, null the method ref. - allocateDirectBuffer() now checks for null CLEANER_CREATE_METHOD and falls back to ByteBuffer.allocateDirect(size), with a helpful OOM message pointing at -XX:MaxDirectMemorySize. With this, spark-shell on JDK11 starts even without `--add-opens java.base/jdk.internal.ref=ALL-UNNAMED`. Adding that add-opens still gives you the bigger off-heap budget. * ODP-7038: restore hive.version to ODP fork 1.2.1.spark24.0.14.1 The earlier SPARK-28723 cherry-pick (9bbdab0) blindly took upstream's hive.version=1.2.1.spark2, which is the upstream spark-project.hive 1.2.1 line - NOT the ODP fork that lives in odp-hive-spark and ships as 1.2.1.spark24.0.14.1. ODP's deployed jar standalone-metastore-1.2.1.spark24.0.14.1-hive3.jar is built from odp-hive-spark/standalone-metastore at 1.2.1.spark24.0.14.1. Any JDK11 patches for the embedded HiveMetaStoreClient (e.g. HIVE-21508's toArray fix) belong in odp-hive-spark, not here. Keep the rest of SPARK-28723 (hive23.version, hadoop-3.2 profile overrides, ThriftserverShimUtils) intact - those only kick in when hadoop-3.2 profile selects the Apache Hive 2.3 path. * ODP-7038: PySpark + bundled py4j source patches for Python 3.11 Stock Spark 2.4 PySpark targets Python 2.7-3.8. Python 3.10 and 3.11 broke several APIs PySpark and its bundled py4j-0.10.7 / cloudpickle 0.x still relied on. This commit applies source-level patches so a fresh `pyspark` session runs cleanly under Python 3.11. The big one: replace the 2017-era single-file pyspark/cloudpickle.py with the vendored cloudpickle 2.2.1 package (exact backport from upstream Apache Spark 3.x's python/pyspark/cloudpickle/). cloudpickle 2.2.1 (Aug 2022) is the first release with full Python 3.11 support - bytecode opcode walker handles the new LOAD_GLOBAL flag encoding, CodeType construction uses .replace() forward-compat, closure cell serialization adapted to 3.11 frame layout, and many other 3.10/3.11 fixes that would have required dozens of manual patches to the old copy. Verified end-to-end on Python 3.11.15: pyspark imports cleanly, lambda closure round-trips through cloudpickle.dumps()/loads() succeed for the patterns that previously raised TypeError: code() argument 13 must be str, not int IndexError: tuple index out of range (in extract_code_globals) RecursionError in save_function/_fill_function Source changes -------------- python/pyspark/cloudpickle.py -> python/pyspark/cloudpickle/ Replace single-file 0.x copy with cloudpickle 2.2.1 vendored as a package (matching upstream Apache Spark 3.x layout). Only deltas vs upstream PyPI cloudpickle 2.2.1: * __init__.py: `from cloudpickle.X` -> `from pyspark.cloudpickle.X` (relocates the package under pyspark) * cloudpickle_fast.py:634: add `len(e.args) > 0` guard to the RecursionError fallback (same as Apache Spark 3.x's vendor diff) python/pyspark/resultiterable.py Python 3.10 removed the lazy collections.* abc aliases. Class ResultIterable(collections.Iterable) raised AttributeError on import. Import from collections.abc with a Python 2 fallback. python/pyspark/sql/types.py python/pyspark/sql/session.py pandas 2.0 removed DataFrame.iteritems(). PySpark uses it in timestamp localization (types.py) and Arrow batch creation (session.py x2). Replace with .items() (present in pandas 1.x and 2.x) guarded by a getattr() probe so older pandas keeps working. python/pyspark/mllib/linalg/__init__.py python/pyspark/ml/linalg/__init__.py Python 3.9 removed array.array.tostring(). Replace with .tobytes() in the DenseVector / SparseVector / DenseMatrix / SparseMatrix pickling paths (6+6 sites). Both methods are bytewise-identical so serialized payloads stay wire-compatible. python/lib/py4j-0.10.7-src.zip Bundled py4j 0.10.7 (from 2018) imports MutableMapping, Sequence, MutableSequence, MutableSet, Set straight from `collections`. Python 3.10 removed those aliases, causing ImportError: cannot import name 'MutableMapping' from 'collections' Patch the bundled zip: java_collections.py uses `from collections.abc` with a `from collections` fallback. Bytes-only change to the zip, no version bump (py4j Java jar stays at 0.10.7 so wire-protocol compat is preserved). Verification ------------ $ PYTHONPATH=python:python/lib/py4j-0.10.7-src.zip python3.11 \ -W ignore -c "import pyspark; print(pyspark.__version__)" 2.4.8 $ python3.11 -W ignore -c " from pyspark import cloudpickle def make(): x = 42 return lambda r: (r, r * x) f = make() assert cloudpickle.loads(cloudpickle.dumps(f))(10) == (10, 420) print('closure round-trip OK')" closure round-trip OK * ODP-7038: restore HiveUtils imports + isHive23, fix hadoop-3.2 profile --------- Co-authored-by: DB Tsai <d_tsai@apple.com> Co-authored-by: senthh <senthil.kumar@acceldata.io> Co-authored-by: Sean Owen <sean.owen@databricks.com> Co-authored-by: Yuming Wang <yumwang@ebay.com>
shubhluck
added a commit
to acceldata-io/spark
that referenced
this pull request
Jun 25, 2026
* ODP-7038|[SPARK-25946][BUILD] Upgrade ASM to 7.x to support JDK11 ## What changes were proposed in this pull request? Upgrade ASM to 7.x to support JDK11 ## How was this patch tested? Existing tests. Closes apache#22953 from dbtsai/asm7. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com> (cherry picked from commit 3ed91c9) * ODP-7038 - Improvement - Enable Spark2 with jdk11 runtime support * ODP-7038 - Improvement - Enable Spark2 with jdk11 runtime support * ODP-7038: replace String.lines with split for JDK11 compile JDK11 added java.lang.String#lines() returning java.util.stream.Stream<String>. Scala 2.11's StringLike implicit also exposes .lines (Iterator[String]), but the Java instance method takes resolution priority on JDK11+. The resulting Stream<String>.toArray returns Object[], and the downstream .size / .forall(_.size <= N) then fail to typecheck: value size is not a member of Object MatricesSuite (both mllib and mllib-local copies) only needs a plain newline split, so use .split("\\n") which returns Array[String] unambiguously on every JDK. * ODP-7038|[SPARK-26839][SQL] Work around classloader changes in Java 9 for Hive isolation Note, this doesn't really resolve the JIRA, but makes the changes we can make so far that would be required to solve it. ## What changes were proposed in this pull request? Java 9+ changed how ClassLoaders work. The two most salient points: - The boot classloader no longer 'sees' the platform classes. A new 'platform classloader' does and should be the parent of new ClassLoaders - The system classloader is no longer a URLClassLoader, so we can't get the URLs of JARs in its classpath ## How was this patch tested? We'll see whether Java 8 tests still pass here. Java 11 tests do not fully pass at this point; more notes below. This does make progress on the failures though. (NB: to test with Java 11, you need to build with Java 8 first, setting JAVA_HOME and java's executable correctly, then switch both to Java 11 for testing.) Closes apache#24057 from srowen/SPARK-26839. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com> (cherry picked from commit c65f9b2) * ODP-7038|[SPARK-28723][SQL] Upgrade to Hive 2.3.6 for HiveMetastore Client and Hadoop-3.2 profile ### What changes were proposed in this pull request? This PR upgrade the built-in Hive to 2.3.6 for `hadoop-3.2`. Hive 2.3.6 release notes: - [HIVE-22096](https://issues.apache.org/jira/browse/HIVE-22096): Backport [HIVE-21584](https://issues.apache.org/jira/browse/HIVE-21584) (Java 11 preparation: system class loader is not URLClassLoader) - [HIVE-21859](https://issues.apache.org/jira/browse/HIVE-21859): Backport [HIVE-17466](https://issues.apache.org/jira/browse/HIVE-17466) (Metastore API to list unique partition-key-value combinations) - [HIVE-21786](https://issues.apache.org/jira/browse/HIVE-21786): Update repo URLs in poms branch 2.3 version ### Why are the changes needed? Make Spark support JDK 11. ### Does this PR introduce any user-facing change? Yes. Please see [SPARK-28684](https://issues.apache.org/jira/browse/SPARK-28684) and [SPARK-24417](https://issues.apache.org/jira/browse/SPARK-24417) for more details. ### How was this patch tested? Existing unit test and manual test. Closes apache#25443 from wangyum/test-on-jenkins. Lead-authored-by: Yuming Wang <yumwang@ebay.com> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 02a0cde) * ODP-7038 - Dev - Adding missing orc versions * ODP-7038: harden Platform.<clinit> Cleaner reflection for JDK11 runtime On JDK11, jdk.internal.ref is not exported to the unnamed module by default, so Method.setAccessible() throws InaccessibleObjectException inside Platform's static block, and spark-shell fails to start with: java.lang.ExceptionInInitializerError at ByteArrayMethods.<clinit> Caused by: InaccessibleObjectException: Unable to make ... jdk.internal.ref.Cleaner.create(Object, Runnable) accessible Backport the SPARK-26839 graceful-degradation pattern from upstream 2.4.x+/3.x: - Catch InaccessibleObjectException by name (avoids importing the JDK9+ class) when setAccessible() on DirectByteBuffer ctor/field fails; null both refs. - Probe createMethod by calling it with null args; if it throws IllegalAccessException, null the method ref. - allocateDirectBuffer() now checks for null CLEANER_CREATE_METHOD and falls back to ByteBuffer.allocateDirect(size), with a helpful OOM message pointing at -XX:MaxDirectMemorySize. With this, spark-shell on JDK11 starts even without `--add-opens java.base/jdk.internal.ref=ALL-UNNAMED`. Adding that add-opens still gives you the bigger off-heap budget. * ODP-7038: restore hive.version to ODP fork 1.2.1.spark24.0.14.1 The earlier SPARK-28723 cherry-pick (9bbdab0) blindly took upstream's hive.version=1.2.1.spark2, which is the upstream spark-project.hive 1.2.1 line - NOT the ODP fork that lives in odp-hive-spark and ships as 1.2.1.spark24.0.14.1. ODP's deployed jar standalone-metastore-1.2.1.spark24.0.14.1-hive3.jar is built from odp-hive-spark/standalone-metastore at 1.2.1.spark24.0.14.1. Any JDK11 patches for the embedded HiveMetaStoreClient (e.g. HIVE-21508's toArray fix) belong in odp-hive-spark, not here. Keep the rest of SPARK-28723 (hive23.version, hadoop-3.2 profile overrides, ThriftserverShimUtils) intact - those only kick in when hadoop-3.2 profile selects the Apache Hive 2.3 path. * ODP-7038: PySpark + bundled py4j source patches for Python 3.11 Stock Spark 2.4 PySpark targets Python 2.7-3.8. Python 3.10 and 3.11 broke several APIs PySpark and its bundled py4j-0.10.7 / cloudpickle 0.x still relied on. This commit applies source-level patches so a fresh `pyspark` session runs cleanly under Python 3.11. The big one: replace the 2017-era single-file pyspark/cloudpickle.py with the vendored cloudpickle 2.2.1 package (exact backport from upstream Apache Spark 3.x's python/pyspark/cloudpickle/). cloudpickle 2.2.1 (Aug 2022) is the first release with full Python 3.11 support - bytecode opcode walker handles the new LOAD_GLOBAL flag encoding, CodeType construction uses .replace() forward-compat, closure cell serialization adapted to 3.11 frame layout, and many other 3.10/3.11 fixes that would have required dozens of manual patches to the old copy. Verified end-to-end on Python 3.11.15: pyspark imports cleanly, lambda closure round-trips through cloudpickle.dumps()/loads() succeed for the patterns that previously raised TypeError: code() argument 13 must be str, not int IndexError: tuple index out of range (in extract_code_globals) RecursionError in save_function/_fill_function Source changes -------------- python/pyspark/cloudpickle.py -> python/pyspark/cloudpickle/ Replace single-file 0.x copy with cloudpickle 2.2.1 vendored as a package (matching upstream Apache Spark 3.x layout). Only deltas vs upstream PyPI cloudpickle 2.2.1: * __init__.py: `from cloudpickle.X` -> `from pyspark.cloudpickle.X` (relocates the package under pyspark) * cloudpickle_fast.py:634: add `len(e.args) > 0` guard to the RecursionError fallback (same as Apache Spark 3.x's vendor diff) python/pyspark/resultiterable.py Python 3.10 removed the lazy collections.* abc aliases. Class ResultIterable(collections.Iterable) raised AttributeError on import. Import from collections.abc with a Python 2 fallback. python/pyspark/sql/types.py python/pyspark/sql/session.py pandas 2.0 removed DataFrame.iteritems(). PySpark uses it in timestamp localization (types.py) and Arrow batch creation (session.py x2). Replace with .items() (present in pandas 1.x and 2.x) guarded by a getattr() probe so older pandas keeps working. python/pyspark/mllib/linalg/__init__.py python/pyspark/ml/linalg/__init__.py Python 3.9 removed array.array.tostring(). Replace with .tobytes() in the DenseVector / SparseVector / DenseMatrix / SparseMatrix pickling paths (6+6 sites). Both methods are bytewise-identical so serialized payloads stay wire-compatible. python/lib/py4j-0.10.7-src.zip Bundled py4j 0.10.7 (from 2018) imports MutableMapping, Sequence, MutableSequence, MutableSet, Set straight from `collections`. Python 3.10 removed those aliases, causing ImportError: cannot import name 'MutableMapping' from 'collections' Patch the bundled zip: java_collections.py uses `from collections.abc` with a `from collections` fallback. Bytes-only change to the zip, no version bump (py4j Java jar stays at 0.10.7 so wire-protocol compat is preserved). Verification ------------ $ PYTHONPATH=python:python/lib/py4j-0.10.7-src.zip python3.11 \ -W ignore -c "import pyspark; print(pyspark.__version__)" 2.4.8 $ python3.11 -W ignore -c " from pyspark import cloudpickle def make(): x = 42 return lambda r: (r, r * x) f = make() assert cloudpickle.loads(cloudpickle.dumps(f))(10) == (10, 420) print('closure round-trip OK')" closure round-trip OK * ODP-7038: restore HiveUtils imports + isHive23, fix hadoop-3.2 profile --------- Co-authored-by: DB Tsai <d_tsai@apple.com> Co-authored-by: senthh <senthil.kumar@acceldata.io> Co-authored-by: Sean Owen <sean.owen@databricks.com> Co-authored-by: Yuming Wang <yumwang@ebay.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Upgrade ASM to 7.x to support JDK11
How was this patch tested?
Existing tests.