Skip to content

[SPARK-35392][ML][PYTHON] Fix flaky tests in ml/clustering.py and ml/feature.py #32533

Closed
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:SPARK_35392_disable_flaky_gmm_test
Closed

[SPARK-35392][ML][PYTHON] Fix flaky tests in ml/clustering.py and ml/feature.py #32533
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:SPARK_35392_disable_flaky_gmm_test

Conversation

@zhengruifeng

@zhengruifeng zhengruifeng commented May 13, 2021

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR removes the check of summary.logLikelihood in ml/clustering.py - this GMM test is quite flaky. It fails easily e.g., if:

  • change number of partitions;
  • just change the way to compute the sum of weights;
  • change the underlying BLAS impl

Also uses more permissive precision on Word2Vec test case.

Why are the changes needed?

To recover the build and tests.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing test cases.

@zhengruifeng

Copy link
Copy Markdown
Contributor Author

ping @HyukjinKwon @srowen @viirya

@HyukjinKwon HyukjinKwon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if tests pass

@HyukjinKwon HyukjinKwon changed the title [SPARK-35392][ML][PYTHON] remove Flaky GMM Test in ml/clustering.py [SPARK-35392][ML][PYTHON] Remove Flaky GMM Test in ml/clustering.py May 13, 2021
@SparkQA

SparkQA commented May 13, 2021

Copy link
Copy Markdown

Test build #138498 has finished for PR 32533 at commit 0081246.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented May 13, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43018/

@SparkQA

SparkQA commented May 13, 2021

Copy link
Copy Markdown

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43018/

@HyukjinKwon

Copy link
Copy Markdown
Member

@zhengruifeng would you mind fixing:

**********************************************************************
File "/__w/spark/spark/python/pyspark/ml/feature.py", line 4681, in __main__.Word2Vec
Failed example:
    model.getVectors().show()
Expected:
    +----+--------------------+
    |word|              vector|
    +----+--------------------+
    |   a|[0.09511678665876...|
    |   b|[-1.2028766870498...|
    |   c|[0.30153277516365...|
    +----+--------------------+
    ...
Got:
    +----+--------------------+
    |word|              vector|
    +----+--------------------+
    |   a|[0.09511695802211...|
    |   b|[-1.2028766870498...|
    |   c|[0.30153274536132...|
    +----+--------------------+
    <BLANKLINE>
**********************************************************************

too? feel free to change the JIRA.

I think we can just fix it like:

    +----+--------------------+
    |word|              vector|
    +----+--------------------+
    |   a|[0.0951 ...
    |   b|[-1.202 ...
    |   c|[0.3015 ...
    +----+--------------------+

@HyukjinKwon HyukjinKwon changed the title [SPARK-35392][ML][PYTHON] Remove Flaky GMM Test in ml/clustering.py [SPARK-35392][ML][PYTHON] Fix flaky tests in ml/clustering.py and ml/feature.py May 13, 2021

@HyukjinKwon HyukjinKwon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA

SparkQA commented May 13, 2021

Copy link
Copy Markdown

Test build #138510 has finished for PR 32533 at commit a1fd16f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented May 13, 2021

Copy link
Copy Markdown

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43030/

@SparkQA

SparkQA commented May 13, 2021

Copy link
Copy Markdown

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43030/

@HyukjinKwon

Copy link
Copy Markdown
Member

Merged to master.

@HyukjinKwon

Copy link
Copy Markdown
Member

Thanks @zhengruifeng for fixing this!

@dongjoon-hyun

Copy link
Copy Markdown
Member

Yes, one stone for two birds! Nice!

@zhengruifeng zhengruifeng deleted the SPARK_35392_disable_flaky_gmm_test branch May 14, 2021 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants