Skip to content

[SPARK-38242][CORE] Sort the SparkSubmit debug output#35556

Closed
martin-g wants to merge 3 commits into
apache:masterfrom
martin-g:sort-debug-info-SparkSubmit
Closed

[SPARK-38242][CORE] Sort the SparkSubmit debug output#35556
martin-g wants to merge 3 commits into
apache:masterfrom
martin-g:sort-debug-info-SparkSubmit

Conversation

@martin-g

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Sort the debug info printed by SparkSubmit when -verbose is enabled.

Why are the changes needed?

This way it is easier to find settings/properties and their values.

Does this PR introduce any user-facing change?

Yes. The log output changes - some data is sorted.

How was this patch tested?

Old unit tests still pass & manually check the sorted output.

 Main class:
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication
Arguments:
--main-class
--primary-java-resource
local:///opt/spark/examples/jars/spark-examples_2.13-3.3.0-SNAPSHOT.jar
org.apache.spark.examples.SparkPi
Spark config:
(spark.app.name,spark-on-k8s-app)
(spark.app.submitTime,1645106476125)
(spark.driver.cores,1)
(spark.driver.extraJavaOptions,-Dio.netty.tryReflectionSetAccessible=true)
(spark.driver.memory,2048m)
(spark.dynamicAllocation.enabled,true)
(spark.dynamicAllocation.shuffleTracking.enabled,true)
(spark.executor.cores,2)
(spark.executor.extraJavaOptions,-Dio.netty.tryReflectionSetAccessible=true)
(spark.executor.instances,3)
(spark.executor.memory,2048m)
(spark.jars,local:///opt/spark/examples/jars/spark-examples_2.13-3.3.0-SNAPSHOT.jar)
(spark.kubernetes.allocation.batch.delay,1)
(spark.kubernetes.allocation.batch.size,3)
(spark.kubernetes.authenticate.driver.serviceAccountName,spark-account-name)
(spark.kubernetes.driver.container.image,spark/spark:3.3.0-SNAPSHOT-scala_2.13-11-jre-slim)
(spark.kubernetes.executor.container.image,spark/spark:3.3.0-SNAPSHOT-scala_2.13-11-jre-slim)
(spark.kubernetes.namespace,spark-on-k8s)
(spark.master,k8s://https://192.168.49.2:8443)
(spark.network.timeout,300)
(spark.submit.deployMode,cluster)
(spark.submit.pyFiles,)
Classpath elements:

@github-actions github-actions Bot added the CORE label Feb 17, 2022
@martin-g martin-g changed the title [SPARK-38242][SS] Sort the SparkSubmit debug output [SPARK-38242][CORE] Sort the SparkSubmit debug output Feb 17, 2022

@srowen srowen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me. I always fear it breaks somebody relying on the output, but I can't see a reasonable use case for that

@AmplabJenkins

Copy link
Copy Markdown

Can one of the admins verify this patch?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait a sec, we can't sort these. Order of args matters; you might have --flag value. You can see the issue in your example, where the arg order doesn't reflect what was passed and loses this information

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah .. I wouldn't sort it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Reverted it!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise classpath order matters, so not sure we should do this.
Spark configs ... yeah order should not matter, so that should be sorted.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted it!

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@srowen

srowen commented Feb 23, 2022

Copy link
Copy Markdown
Member

Hm, I think the test failures may be legitimate here. They are tests related to conf. Can you check the failed tests and see if they need an adjustment after the output order changes?

@martin-g martin-g closed this Feb 24, 2022
@martin-g martin-g reopened this Feb 24, 2022
@srowen

srowen commented Feb 24, 2022

Copy link
Copy Markdown
Member

@HyukjinKwon do you know why it seems like the main tests weren't run here? does he need to enable tests in his branch?

@HyukjinKwon

Copy link
Copy Markdown
Member

Ah, after enabling, he needs to push an empty commit or rebase to trigger properly. ... I think probably this is a bug we need to fix

@HyukjinKwon

HyukjinKwon commented Feb 25, 2022

Copy link
Copy Markdown
Member

I pushed (and removed back) some arbitrary changes to trigger the CI

@martin-g

Copy link
Copy Markdown
Member Author

Thank you, @HyukjinKwon !
Nice trick with the Github suggestions feature!
Allow edits and access to secrets by maintainers is also enabled, so you could even push an empty commit

I used close/reopen PR to re-trigger the checks but this needs https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request and this event is not enabled at

. Do you want me to open a new PR to add this event or it is not there intentionally ?
The change is to add these two lines - https://github.com/apache/avro/blob/f10fa6b9e30e38ec5cde6e00288f0b31b24bf1de/.github/workflows/test-lang-c.yml#L21-L22

@srowen srowen closed this in 9eab255 Feb 26, 2022
@srowen

srowen commented Feb 26, 2022

Copy link
Copy Markdown
Member

Merged to master

@martin-g martin-g deleted the sort-debug-info-SparkSubmit branch February 26, 2022 20:22
@martin-g

Copy link
Copy Markdown
Member Author

Thank you, @srowen and @HyukjinKwon !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants