Skip to content

[SPARK-47598][CORE] MLLib: Migrate logError with variables to structured logging framework#45837

Closed
panbingkun wants to merge 9 commits into
apache:masterfrom
panbingkun:SPARK-47598
Closed

[SPARK-47598][CORE] MLLib: Migrate logError with variables to structured logging framework#45837
panbingkun wants to merge 9 commits into
apache:masterfrom
panbingkun:SPARK-47598

Conversation

@panbingkun

@panbingkun panbingkun commented Apr 3, 2024

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

The pr aims to migrate logError in module MLLib with variables to structured logging framework.

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

}

private def withLogContext(context: java.util.HashMap[String, String])(body: => Unit): Unit = {
protected def withLogContext(context: java.util.HashMap[String, String])(body: => Unit): Unit = {

@panbingkun panbingkun Apr 3, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change private to protected
because some class extends Logging and override the method logInfo, logWarn, 'logError', eg:
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala

/**
* Logs a LogEntry which message with a prefix that uniquely identifies the training session.
*/
override def logError(entry: LogEntry): Unit = {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can write it as follows:

override def logError(entry: LogEntry): Unit = {
   super.logError(MessageWithContext(prefix + entry.message, entry.context))
}

But it seems that the efficiency is not as high as mentioned above.

Comment thread common/utils/src/main/scala/org/apache/spark/internal/Logging.scala Outdated
@panbingkun

Copy link
Copy Markdown
Contributor Author

cc @gengliangwang

val MIN_SIZE = Value
val REMOTE_ADDRESS = Value
val POD_ID = Value
val NUM_ITERATIONS = Value

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's sort the keys.

val msg = s"Classification labels should be in [0 to ${numClasses - 1}]. " +
s"Found $numInvalid invalid labels."
val msg = log"Classification labels should be in " +
log"${MDC(RANGE_CLASSIFICATION_LABELS, s"[0 to ${numClasses - 1}]")}. " +

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about making the log keys generic. How about making it RANGE here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

s"Found $numInvalid invalid labels."
val msg = log"Classification labels should be in " +
log"${MDC(RANGE_CLASSIFICATION_LABELS, s"[0 to ${numClasses - 1}]")}. " +
log"Found ${MDC(NUM_CLASSIFICATION_LABELS, numInvalid)} invalid labels."

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about making the log keys generic. How about making it COUNT here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah


if (rawCoefficients == null) {
val msg = s"${optimizer.getClass.getName} failed."
val msg = log"${MDC(OPTIMIZER_CLASS_NAME, optimizer.getClass.getName)} failed."

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple duplicated codes in the changes of this PR. Let's create a method to reduce duplications.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, very good suggestion

case e: java.lang.AssertionError =>
logError(s"FAILED for numIterations=$numIterations, learningRate=$learningRate," +
s" subsamplingRate=$subsamplingRate")
logError(log"FAILED for numIterations=${MDC(NUM_ITERATIONS, numIterations)}, " +

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a method to reduce duplicated code in this file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@panbingkun

Copy link
Copy Markdown
Contributor Author

@gengliangwang all done.

@panbingkun panbingkun marked this pull request as ready for review April 4, 2024 00:57
@gengliangwang

Copy link
Copy Markdown
Member

Thanks, merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants