Skip to content

[SPARK-18111] [SQL] Wrong ApproximatePercentile answer when multiple records have the minimum value#15641

Closed
wzhfy wants to merge 4 commits into
apache:masterfrom
wzhfy:percentile
Closed

[SPARK-18111] [SQL] Wrong ApproximatePercentile answer when multiple records have the minimum value#15641
wzhfy wants to merge 4 commits into
apache:masterfrom
wzhfy:percentile

Conversation

@wzhfy

@wzhfy wzhfy commented Oct 26, 2016

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

When multiple records have the minimum value, the answer of ApproximatePercentile is wrong.

How was this patch tested?

add a test case

@wzhfy wzhfy changed the title [SPARK-18111] [SQL] add the min element when compressing samples in QuantileSummaries [SPARK-18111] [SQL] Wrong ApproximatePercentile answer when multiple records have the minimum value Oct 26, 2016
@wzhfy

wzhfy commented Oct 26, 2016

Copy link
Copy Markdown
Contributor Author

/cc @thunterdb @cloud-fan

@SparkQA

SparkQA commented Oct 26, 2016

Copy link
Copy Markdown

Test build #67566 has finished for PR 15641 at commit 3ad5920.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Oct 26, 2016

Copy link
Copy Markdown

Test build #67574 has finished for PR 15641 at commit de1fdd5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get @thunterdb 's OK but that looks good to me.

@SparkQA

SparkQA commented Oct 27, 2016

Copy link
Copy Markdown

Test build #67634 has finished for PR 15641 at commit 0a5489c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy

wzhfy commented Oct 27, 2016

Copy link
Copy Markdown
Contributor Author

retest this please

@SparkQA

SparkQA commented Oct 27, 2016

Copy link
Copy Markdown

Test build #67639 has finished for PR 15641 at commit 0a5489c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy

wzhfy commented Oct 27, 2016

Copy link
Copy Markdown
Contributor Author

Seems many test builds failed in org.apache.spark.sql.streaming.StreamingQuerySuite.StreamExecution metadata garbage collection. Do you know why? @srowen

@wzhfy

wzhfy commented Oct 27, 2016

Copy link
Copy Markdown
Contributor Author

retest this please

@SparkQA

SparkQA commented Oct 27, 2016

Copy link
Copy Markdown

Test build #67652 has finished for PR 15641 at commit 0a5489c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen

srowen commented Oct 31, 2016

Copy link
Copy Markdown
Member

@thunterdb It looks reasonable to my understanding, having looked at this code with you before, so if you don't object I'll merge this tomorrow.

@srowen

srowen commented Nov 1, 2016

Copy link
Copy Markdown
Member

Merged to master

@asfgit asfgit closed this in cb80edc Nov 1, 2016
@srowen

srowen commented Nov 1, 2016

Copy link
Copy Markdown
Member

It has a merge conflict in 2.0, but if you'd like to get it into 2.0 and could open a similar PR for testing, I can merge that.

@wzhfy

wzhfy commented Nov 1, 2016

Copy link
Copy Markdown
Contributor Author

@srowen Seems branch-2.0 doesn't have related files, ApproximatePercentile.scala is not merged in 2.0?

@srowen

srowen commented Nov 1, 2016

Copy link
Copy Markdown
Member

Yeah it was in sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala before in 2.0. I'm not sure if it has the same problem or not.

@wzhfy

wzhfy commented Nov 1, 2016

Copy link
Copy Markdown
Contributor Author

Got it, thanks! I think it also has the problem, let me open a PR for it.

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ecords have the minimum value

## What changes were proposed in this pull request?

When multiple records have the minimum value, the answer of ApproximatePercentile is wrong.
## How was this patch tested?

add a test case

Author: wangzhenhua <wangzhenhua@huawei.com>

Closes apache#15641 from wzhfy/percentile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants