Skip to content

[SPARK-3181][MLLIB]: Add Robust Regression Algorithm with Huber Estimator#8013

Closed
fjiang6 wants to merge 1 commit into
apache:masterfrom
fjiang6:Huawei-Robust
Closed

[SPARK-3181][MLLIB]: Add Robust Regression Algorithm with Huber Estimator#8013
fjiang6 wants to merge 1 commit into
apache:masterfrom
fjiang6:Huawei-Robust

Conversation

@fjiang6

@fjiang6 fjiang6 commented Aug 7, 2015

Copy link
Copy Markdown

Huber Robust Regression under spark/ml/regression
Unit Tests

@SparkQA

SparkQA commented Aug 7, 2015

Copy link
Copy Markdown

Test build #40111 has finished for PR 8013 at commit 2f67e63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RobustRegression(override val uid: String)

@fjiang6

fjiang6 commented Aug 7, 2015

Copy link
Copy Markdown
Author

@mengxr @dbtsai @srowen had RobustRegression in the same LinearRegression codebase as requested. And included the Unit Tests.

@dbtsai

dbtsai commented Aug 7, 2015

Copy link
Copy Markdown
Member

Still a lot of duplication. We're adding new features into LiR now, and it will be hard to maintain. Is it possible that you just add the objective function, and use Params to switch between different objective function? Thanks.

@SparkQA

SparkQA commented Aug 8, 2015

Copy link
Copy Markdown

Test build #40222 has finished for PR 8013 at commit 96e38a7.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 8, 2015

Copy link
Copy Markdown

Test build #40223 has finished for PR 8013 at commit 23e4c62.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@fjiang6

fjiang6 commented Aug 8, 2015

Copy link
Copy Markdown
Author

@dbtsai ust added the objective function, and use Params to switch between different objective function. Thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sharedParams.scala can not be edited directly. Please look at SharedParamsCodeGen.scala.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, make HasRobust as HasRobustRegression in SharedParamsCodeGen.scala.

@SparkQA

SparkQA commented Aug 11, 2015

Copy link
Copy Markdown

Test build #40530 has finished for PR 8013 at commit 51e47dc.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class EqualNullSafe(attribute: String, value: Any) extends Filter

@fjiang6

fjiang6 commented Aug 11, 2015

Copy link
Copy Markdown
Author

This class was not added by me. I didn't touch PySpark.

@SparkQA

SparkQA commented Aug 23, 2015

Copy link
Copy Markdown

Test build #41422 has finished for PR 8013 at commit 1567635.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 23, 2015

Copy link
Copy Markdown

Test build #41421 has finished for PR 8013 at commit 3bb5930.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 23, 2015

Copy link
Copy Markdown

Test build #41423 has finished for PR 8013 at commit a04179b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to introduce a "costFunction" param which defaults to "LeastSquares" and pattern match in LinearRegression#L195 since that will force mutual exclusivity when more than two cost functions are possible

@SparkQA

SparkQA commented Aug 27, 2015

Copy link
Copy Markdown

Test build #41662 has finished for PR 8013 at commit e447623.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is not really an "Option", can we just make this say " Set whether to use robust Huber Cost Function"

@feynmanliang

Copy link
Copy Markdown
Contributor

There is a lot of code repetition between this and #2096, perhaps you can make the mllib one wrap this?

@dbtsai

dbtsai commented Sep 1, 2015

Copy link
Copy Markdown
Member

Hello, robust tuning parameter k should not be a constant as you implemented.
In the paper, http://users.stat.umn.edu/~sandy/courses/8053/handouts/robust.pdf
k = 1.345σ where σ is the square error of current square loss. But this will be very expensive to compute the current square error of current square loss and then compute the huber loss, so I think it's reasonable to approximate the square error from previous weight.

add the objective function, and use Params to switch

edit to pass scala style tests

make HasRobustRegression in SharedParamsCodeGen.scala, Make the document more explicitly and make k tunable and default to 1.345 by having another param

UnitTests with Outliers

UnitTests with Outliers

Edit HuberAggregator

scala codestyle

Update LinearRegression.scala
@SparkQA

SparkQA commented Jan 17, 2016

Copy link
Copy Markdown

Test build #49555 has finished for PR 8013 at commit 01601ee.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin

rxin commented Jun 15, 2016

Copy link
Copy Markdown
Contributor

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. We can also continue the discussion on the JIRA ticket.

@dbtsai there are a few pull requests that were waiting on your review. Can you revisit them even if they are closed?

@asfgit asfgit closed this in 1a33f2e Jun 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants