Skip to content

[SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator#17084

Closed
imatiach-msft wants to merge 12 commits into
apache:masterfrom
imatiach-msft:ilmat/binary-evalute
Closed

[SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator#17084
imatiach-msft wants to merge 12 commits into
apache:masterfrom
imatiach-msft:ilmat/binary-evalute

Conversation

@imatiach-msft

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: #16557
as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

How was this patch tested?

I added tests to the metrics and evaluators classes.

@SparkQA

SparkQA commented Feb 27, 2017

Copy link
Copy Markdown

Test build #73526 has finished for PR 17084 at commit 98652cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"2.2.0\") (

@imatiach-msft

Copy link
Copy Markdown
Contributor Author

@sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen would you be able to take a look? I've split the larger pull request into three parts as suggested.

@imatiach-msft

Copy link
Copy Markdown
Contributor Author

ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? thank you!

@actuaryzhang

Copy link
Copy Markdown
Contributor

Thanks for the PR. I think this is helpful. Will take a look next week. Quite swamped recently.

@HyukjinKwon

Copy link
Copy Markdown
Member

gentle ping @actuaryzhang

@actuaryzhang

Copy link
Copy Markdown
Contributor

@imatiach-msft Thanks for the PR. Added a couple of comments. Sorry for the delayed review.

@HyukjinKwon

Copy link
Copy Markdown
Member

gentle ping @imatiach-msft .

@imatiach-msft

Copy link
Copy Markdown
Contributor Author

yes, will update the PR, thanks for the ping

@imatiach-msft imatiach-msft force-pushed the ilmat/binary-evalute branch from 98652cf to cf59c62 Compare June 26, 2017 03:55
@imatiach-msft

Copy link
Copy Markdown
Contributor Author

Jenkins, test this please

@SparkQA

SparkQA commented Jun 26, 2017

Copy link
Copy Markdown

Test build #78597 has finished for PR 17084 at commit cf59c62.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"2.2.0\") (

@SparkQA

SparkQA commented Jun 26, 2017

Copy link
Copy Markdown

Test build #78598 has finished for PR 17084 at commit 60fc2a7.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft

Copy link
Copy Markdown
Contributor Author

the pip packaging failing seems to be unrelated to the code... let me try this again

@imatiach-msft

Copy link
Copy Markdown
Contributor Author

Jenkins, test this please

@SparkQA

SparkQA commented Jun 26, 2017

Copy link
Copy Markdown

Test build #78606 has finished for PR 17084 at commit 60fc2a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft

Copy link
Copy Markdown
Contributor Author

gentle ping @actuaryzhang, thanks!

@imatiach-msft

Copy link
Copy Markdown
Contributor Author

Jenkins, test this please

@SparkQA

SparkQA commented Apr 16, 2018

Copy link
Copy Markdown

Test build #89405 has finished for PR 17084 at commit e89a030.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft

Copy link
Copy Markdown
Contributor Author

gently ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? I've merged with latest code. Thank you!

@imatiach-msft imatiach-msft changed the title [SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator [SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator May 14, 2018
@SparkQA

SparkQA commented Feb 5, 2019

Copy link
Copy Markdown

Test build #102046 has finished for PR 17084 at commit 793a284.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"3.0.0\") (

@SparkQA

SparkQA commented Feb 11, 2019

Copy link
Copy Markdown

Test build #102177 has finished for PR 17084 at commit e78076b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"3.0.0\") (

@holdenk holdenk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick first pass review, thanks for working on this :)

@SparkQA

SparkQA commented Feb 19, 2019

Copy link
Copy Markdown

Test build #102510 has finished for PR 17084 at commit 955b022.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft

Copy link
Copy Markdown
Contributor Author

@holdenk @srowen @actuaryzhang would you be able to take another look at this PR when you have a chance? I've updated to latest, fixed all failing tests and I think I have fixed all of the comments. Thank you!

@srowen srowen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking OK; does Pyspark need an update, and does the multiclass evaluator need an update to match?

@imatiach-msft

Copy link
Copy Markdown
Contributor Author

@srowen multiclass weight columns PR is already merged:
#17086
The pyspark side will need to be updated. Should that be part of this PR or a separate PR?

@srowen

srowen commented Feb 22, 2019

Copy link
Copy Markdown
Member

Ah right, long since lost track. Yeah it would make sense to update it here too. I don't think R has an evaluator?

@SparkQA

SparkQA commented Feb 22, 2019

Copy link
Copy Markdown

Test build #102621 has finished for PR 17084 at commit 079e114.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Feb 24, 2019

Copy link
Copy Markdown

Test build #102720 has finished for PR 17084 at commit a571adc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol, HasWeightCol,

Comment thread python/pyspark/mllib/evaluation.py Outdated
@SparkQA

SparkQA commented Feb 25, 2019

Copy link
Copy Markdown

Test build #102733 has finished for PR 17084 at commit d8a5865.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class MulticlassMetrics @Since(\"1.1.0\") (predictionAndLabels: RDD[_ <: Product])

@SparkQA

SparkQA commented Feb 25, 2019

Copy link
Copy Markdown

Test build #102734 has finished for PR 17084 at commit 00bfec1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class MulticlassMetrics @Since(\"1.1.0\") (predictionAndLabels: RDD[_ <: Product])

@imatiach-msft

Copy link
Copy Markdown
Contributor Author

@srowen I think I have fixed all pending comments and the tests are currently passing for this PR. I couldn't find any evaluators in the R code, but then again I am not as familiar with sparkR (however, I've used sparklyr before https://github.com/rstudio/sparklyr). Please let me know if there are any other comments that need to be addressed. Thank you!

@srowen

srowen commented Feb 25, 2019

Copy link
Copy Markdown
Member

Merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants