Skip to content

[SPARK-26754][PYTHON] Add hasTrainingSummary to replace duplicate code in PySpark#23676

Closed
huaxingao wants to merge 2 commits into
apache:masterfrom
huaxingao:spark26754
Closed

[SPARK-26754][PYTHON] Add hasTrainingSummary to replace duplicate code in PySpark#23676
huaxingao wants to merge 2 commits into
apache:masterfrom
huaxingao:spark26754

Conversation

@huaxingao

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Python version of #17654

How was this patch tested?

Existing Python unit test

@SparkQA

SparkQA commented Jan 28, 2019

Copy link
Copy Markdown

Test build #101774 has finished for PR 23676 at commit 1d70ad1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class LogisticRegressionModel(JavaModel, JavaClassificationModel, JavaMLWritable, JavaMLReadable,
  • class GaussianMixtureModel(JavaModel, JavaMLWritable, JavaMLReadable, HasTrainingSummary):
  • class KMeansModel(JavaModel, GeneralJavaMLWritable, JavaMLReadable, HasTrainingSummary):
  • class BisectingKMeansModel(JavaModel, JavaMLWritable, JavaMLReadable, HasTrainingSummary):
  • class LinearRegressionModel(JavaModel, JavaPredictionModel, GeneralJavaMLWritable, JavaMLReadable,
  • class HasTrainingSummary(object):

Comment thread python/pyspark/ml/util.py Outdated
"""

@property
@since("3.0.0")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue I see here is that these properties were effectively added to subclasses in 2.0.0 and 2.1.0, so this seems somewhat misleading. Maybe we can declare these "since 2.1.0" as close enough?

Comment thread python/pyspark/ml/util.py Outdated
Gets summary of the model trained on the training set. An exception is thrown if
no summary exists.
"""
if self.hasSummary:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also call "summary" directly though I suppose that would result in a different, possibly less useful error. I was also wondering whether RuntimeError is the right one, but I see other call sites above throw this when there is no summary.

Can you remove other checks for "if self.hasSummary" then, above? because when summary() is called it will now throw this error already.

@SparkQA

SparkQA commented Jan 30, 2019

Copy link
Copy Markdown

Test build #101911 has finished for PR 23676 at commit 6308e62.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -532,24 +533,16 @@ def summary(self):
trained on the training set. An exception is thrown if `trainingSummary is None`.
"""
if self.hasSummary:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon if you were already going to follow up on this, but I think you don't need to check self.hasSummary here and other classes anymore. The implementation in the new trait can do that in one place.

Comment thread python/pyspark/ml/clustering.py
@srowen

srowen commented Feb 1, 2019

Copy link
Copy Markdown
Member

OK I get it, yeah, the polymorphism doesn't come across, and has to be mirrored in Python.

@srowen

srowen commented Feb 1, 2019

Copy link
Copy Markdown
Member

Merged to master

@srowen srowen closed this in 5bb9647 Feb 1, 2019
@huaxingao

Copy link
Copy Markdown
Contributor Author

Thanks a lot for your help! @srowen

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…e in PySpark

## What changes were proposed in this pull request?

Python version of apache#17654

## How was this patch tested?

Existing Python unit test

Closes apache#23676 from huaxingao/spark26754.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants