Add Dataproc 3.0 and Hive 4 compatibility to Hive lineage script#1398
Add Dataproc 3.0 and Hive 4 compatibility to Hive lineage script#1398pathakriya-cyber wants to merge 1 commit into
Conversation
On Dataproc 3.0 (Hive 4), OpenLineage requires the GCP lineage transport library (transports-gcplineage.jar) to be explicitly copied into Hive's library folder (/usr/lib/hive/lib/) so Hive can send lineage events to GCP Dataplex without throwing ClassNotFoundException. This change: 1. Automatically copies Spark's transports-gcplineage.jar into Hive's library folder. 2. Modernizes legacy "gsutil cp" calls to "gcloud storage cp". Verified end-to-end against Dataproc 3.0 Debian 13: http://sponge2/11f79ec0-b6d7-403e-8bc9-f7e923d70635
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request updates hive-lineage.sh to use gcloud storage cp instead of gsutil cp and adds logic to copy the GCP lineage transport JAR into the Hive library directory for Dataproc 3.0 compatibility. The feedback recommends restoring the -P flag in the gcloud storage cp command to preserve POSIX attributes and ensure the downloaded JAR remains readable by the hive user.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| echo "Installing openlineage-hive hook" | ||
| gsutil cp -P "$INSTALLATION_SOURCE/hive-openlineage-hook-$HIVE_OL_HOOK_VERSION.jar" "$HIVE_LIB_DIR/hive-openlineage-hook.jar" | ||
| } | ||
| gcloud storage cp "$INSTALLATION_SOURCE/hive-openlineage-hook-$HIVE_OL_HOOK_VERSION.jar" "$HIVE_LIB_DIR/hive-openlineage-hook.jar" |
This comment was marked as duplicate.
This comment was marked as duplicate.
Sorry, something went wrong.
| echo "Installing openlineage-hive hook" | ||
| gsutil cp -P "$INSTALLATION_SOURCE/hive-openlineage-hook-$HIVE_OL_HOOK_VERSION.jar" "$HIVE_LIB_DIR/hive-openlineage-hook.jar" | ||
| } | ||
| gcloud storage cp "$INSTALLATION_SOURCE/hive-openlineage-hook-$HIVE_OL_HOOK_VERSION.jar" "$HIVE_LIB_DIR/hive-openlineage-hook.jar" |
| gcloud storage cp "$INSTALLATION_SOURCE/hive-openlineage-hook-$HIVE_OL_HOOK_VERSION.jar" "$HIVE_LIB_DIR/hive-openlineage-hook.jar" | ||
|
|
||
| echo "Copying GCP lineage transport jar into Hive lib folder for Dataproc 3.0 compatibility..." | ||
| if [[ -f "/usr/lib/spark/connector/transports-gcplineage.jar" ]]; then |
There was a problem hiding this comment.
transports-gcplineage:1.27 is bundled in hive-openlineage-hook.jar, and the jar you're copying from spark classpath is based on transports-gcplineage:1.34
This might cause some issues with class conflicts. I'm assuming that the reason v1.27 is not working well here is because some of the classes do not support Hive 4? It is probably not a good idea to have multiple versions of the same class in the Hive class path. (Even if our local tests are passing, customer workloads might break as customers might bring in their own classes to the classpath)
codelixir
left a comment
There was a problem hiding this comment.
If you could share the fully qualified names of the class(es) that result in ClassNotFoundException, it would help determine if this is the right fix
On Dataproc 3.0 (Hive 4), OpenLineage requires the GCP lineage transport library (transports-gcplineage.jar) to be explicitly copied into Hive's library folder (/usr/lib/hive/lib/) so Hive can send lineage events to GCP Dataplex without throwing ClassNotFoundException.
This change:
Verified end-to-end against Dataproc 3.0 Debian 13:
http://sponge2/11f79ec0-b6d7-403e-8bc9-f7e923d70635