Skip to content

[SPARK-34820][K8S][R] add apt-update before gnupg install#31923

Closed
Yikun wants to merge 1 commit into
apache:masterfrom
Yikun:SPARK-34820
Closed

[SPARK-34820][K8S][R] add apt-update before gnupg install#31923
Yikun wants to merge 1 commit into
apache:masterfrom
Yikun:SPARK-34820

Conversation

@Yikun

@Yikun Yikun commented Mar 22, 2021

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

We added the gnupg installation in #30130 , we should do apt update before gnupg isntallation, otherwise we will get a fetch error when package is updated.

See more in:
[1] http://apache-spark-developers-list.1001551.n3.nabble.com/K8s-Integration-test-is-unable-to-run-because-of-the-unavailable-libs-td30986.html

Why are the changes needed?

add a apt-update cmd before gnupg installation to avoid invaild package cache list.

Does this PR introduce any user-facing change?

No

How was this patch tested?

K8s Integration test passed

@Yikun

Yikun commented Mar 22, 2021

Copy link
Copy Markdown
Member Author

cc @Ngone51

@Ngone51

Ngone51 commented Mar 22, 2021

Copy link
Copy Markdown
Member

@Yikun thanks for the quick fix. cc @dongjoon-hyun @holdenk

@Ngone51

Ngone51 commented Mar 22, 2021

Copy link
Copy Markdown
Member

ok to test

@SparkQA

SparkQA commented Mar 22, 2021

Copy link
Copy Markdown

Test build #136335 has finished for PR 31923 at commit 7418fdb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Mar 22, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40920/

@SparkQA

SparkQA commented Mar 22, 2021

Copy link
Copy Markdown

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40920/

@Yikun

Yikun commented Mar 22, 2021

Copy link
Copy Markdown
Member Author
Step 6/12 : RUN   apt-get update &&   apt install -y gnupg &&   echo "deb http://cloud.r-project.org/bin/linux/debian buster-cran35/" >> /etc/apt/sources.list &&   (apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' || apt-key adv --keyserver keys.openpgp.org --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF') &&   apt-get update &&   apt install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*
 ---> Running in 890d910fdc12
... ...

 ---> 6fc29805545e
Step 7/12 : COPY R ${SPARK_HOME}/R
 ---> 98781dd121aa
Step 8/12 : ENV R_HOME /usr/lib/R

From [1] (Step 6/12 succeed), we can see the failed fetch problem of SPARK-34820 had been fixed.

[1] https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40920/consoleFull

@Yikun

Yikun commented Mar 22, 2021

Copy link
Copy Markdown
Member Author
- Run SparkR on simple dataframe.R example *** FAILED ***
  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://192.168.39.219:8443/api/v1/namespaces/5b12a03f6e2c40308f8c12bf4d9ea3e3/pods/spark-test-app-f99d40879bc94e569c8ae32b79a88970/log?pretty=false. Message: container "spark-kubernetes-driver" in pod "spark-test-app-f99d40879bc94e569c8ae32b79a88970" is waiting to start: trying and failing to pull image. Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=container "spark-kubernetes-driver" in pod "spark-test-app-f99d40879bc94e569c8ae32b79a88970" is waiting to start: trying and failing to pull image, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=BadRequest, status=Failure, additionalProperties={}).
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:570)
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:509)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.doGetLog(PodOperationsImpl.java:189)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:198)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:85)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$3(KubernetesSuite.scala:89)
  at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
  at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
  at org.apache.spark.SparkFunSuite.logInfo(SparkFunSuite.scala:61)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$2(KubernetesSuite.scala:86)
  ...

and there are still some other errors triggerd, looks like unrelated.

@Ngone51

Ngone51 commented Mar 22, 2021

Copy link
Copy Markdown
Member

retest this please

@Ngone51

Ngone51 commented Mar 22, 2021

Copy link
Copy Markdown
Member

Let's retry test see if it's flaky

@SparkQA

SparkQA commented Mar 22, 2021

Copy link
Copy Markdown

Test build #136345 has finished for PR 31923 at commit 7418fdb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Mar 22, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40929/

@SparkQA

SparkQA commented Mar 22, 2021

Copy link
Copy Markdown

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40929/

@Yikun

Yikun commented Mar 22, 2021

Copy link
Copy Markdown
Member Author

image

The SparkPullRequestBuilder-K8s (Kubernetes integration test) back to green again, I think the PR is ready to merge.

[1] https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/

@shaneknapp shaneknapp self-requested a review March 22, 2021 15:38
@shaneknapp

Copy link
Copy Markdown
Contributor

thanks for doing this!

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.
Merged to master/3.1.
cc @attilapiros

dongjoon-hyun pushed a commit that referenced this pull request Mar 22, 2021
### What changes were proposed in this pull request?
We added the gnupg installation in #30130 , we should do apt update before gnupg isntallation, otherwise we will get a fetch error when package is updated.

See more in:
[1] http://apache-spark-developers-list.1001551.n3.nabble.com/K8s-Integration-test-is-unable-to-run-because-of-the-unavailable-libs-td30986.html

### Why are the changes needed?
add a apt-update cmd before gnupg installation to avoid invaild package cache list.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
K8s Integration test passed

Closes #31923 from Yikun/SPARK-34820.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 31da907)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@Ngone51

Ngone51 commented Mar 25, 2021

Copy link
Copy Markdown
Member

Hi all, I'd like to get your continued attention on the K8s integration test issue. After this PR fixed the gnupg installation issue, another issue shows up, which is almost a constant failure:

- Run SparkR on simple dataframe.R example *** FAILED ***
  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://192.168.39.219:8443/api/v1/namespaces/b80cb1250aba4d92a99fc87b609b2328/pods/spark-test-app-411d038edc8b4e9b8e3761052cd44bd8/log?pretty=false. Message: container "spark-kubernetes-driver" in pod "spark-test-app-411d038edc8b4e9b8e3761052cd44bd8" is waiting to start: trying and failing to pull image. Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=container "spark-kubernetes-driver" in pod "spark-test-app-411d038edc8b4e9b8e3761052cd44bd8" is waiting to start: trying and failing to pull image, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=BadRequest, status=Failure, additionalProperties={}).
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:570)
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:509)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.doGetLog(PodOperationsImpl.java:189)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:198)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:85)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$3(KubernetesSuite.scala:89)
  at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
  at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
  at org.apache.spark.SparkFunSuite.logInfo(SparkFunSuite.scala:61)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$2(KubernetesSuite.scala:86)

And here is PR that has tried K8s integration test multiple times but all failed.
Could someone please help take a look? Thanks!

@attilapiros

Copy link
Copy Markdown
Contributor

@Ngone51 I checked the PR you mentioned.

My findings based on the last failure.

Here the first error is:

- Launcher client dependencies
- SPARK-33615: Launcher client archives *** FAILED ***
  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://192.168.39.147:8443/api/v1/namespaces/09e9c94160d543c1a338f364722d49a6/pods/spark-test-app-c80c78f512574b36b2608f7d92c24503/log?pretty=false. Message: container "spark-kubernetes-driver" in pod "spark-test-app-c80c78f512574b36b2608f7d92c24503" is waiting to start: trying and failing to pull image. Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=container "spark-kubernetes-driver" in pod "spark-test-app-c80c78f512574b36b2608f7d92c24503" is waiting to start: trying and failing to pull image, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=BadRequest, status=Failure, additionalProperties={}).
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:570)
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:509)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.doGetLog(PodOperationsImpl.java:189)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:198)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:85)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$3(KubernetesSuite.scala:89)
  at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
  at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
  at org.apache.spark.SparkFunSuite.logInfo(SparkFunSuite.scala:61)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$2(KubernetesSuite.scala:86)
  ...

Checking the last successful and first failed ones we can see just a very few differences in the code: the failed one uses a --archives and the successful uses --files.

I do not think this difference could lead to an error such "is waiting to start: trying and failing to pull image."

@attilapiros

Copy link
Copy Markdown
Contributor

It would be wonderful to see kubectl describe pod <pod> output.
Actually I think I can do this in one of my PR...

@attilapiros

Copy link
Copy Markdown
Contributor

Of course locally all the tests passed:

All tests passed.
...
[INFO] Spark Project Kubernetes Integration Tests ......... SUCCESS [25:12 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------

@attilapiros

Copy link
Copy Markdown
Contributor

I hope this will help to troubleshoot this and similar errors:
#31962

@Ngone51

Ngone51 commented Mar 29, 2021

Copy link
Copy Markdown
Member

Thanks for the effort @attilapiros

@shaneknapp

shaneknapp commented Mar 29, 2021 via email

Copy link
Copy Markdown
Contributor

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
### What changes were proposed in this pull request?
We added the gnupg installation in apache#30130 , we should do apt update before gnupg isntallation, otherwise we will get a fetch error when package is updated.

See more in:
[1] http://apache-spark-developers-list.1001551.n3.nabble.com/K8s-Integration-test-is-unable-to-run-because-of-the-unavailable-libs-td30986.html

### Why are the changes needed?
add a apt-update cmd before gnupg installation to avoid invaild package cache list.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
K8s Integration test passed

Closes apache#31923 from Yikun/SPARK-34820.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 31da907)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants