Skip to content

[SPARK-9740] [SPARK-9592] [SPARK-9210] [SQL] Change the default behavior of First/Last to RESPECT NULLS.#8113

Closed
yhuai wants to merge 8 commits into
apache:masterfrom
yhuai:firstLast
Closed

[SPARK-9740] [SPARK-9592] [SPARK-9210] [SQL] Change the default behavior of First/Last to RESPECT NULLS.#8113
yhuai wants to merge 8 commits into
apache:masterfrom
yhuai:firstLast

Conversation

@yhuai

@yhuai yhuai commented Aug 11, 2015

Copy link
Copy Markdown
Contributor

I am changing the default behavior of First/Last to respect null values (the SQL standard default behavior).

https://issues.apache.org/jira/browse/SPARK-9740

@yhuai

yhuai commented Aug 11, 2015

Copy link
Copy Markdown
Contributor Author

@hvanhovell I am making the default in this PR.

@SparkQA

SparkQA commented Aug 12, 2015

Copy link
Copy Markdown

Test build #40531 has finished for PR 8113 at commit 6ee4a04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell

Copy link
Copy Markdown
Contributor

One final question, shouldn't we introduce a skipNulls parameter? Or do you want to address this in a follow-up PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could for now simplify this to Literal.create(true, BooleanType).

@yhuai

yhuai commented Aug 12, 2015

Copy link
Copy Markdown
Contributor Author

yeah. I think that only requires a small change. Let me add it.

@hvanhovell

Copy link
Copy Markdown
Contributor

Besides the valueSet update/merge expressions LGTM.

@yhuai

yhuai commented Aug 12, 2015

Copy link
Copy Markdown
Contributor Author

@hvanhovell Alright. I added the ignoreNulls flag.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggupta81 This is the fix of SPARK-9592 for our master and 1.5 branch.

@yhuai yhuai changed the title [SPARK-9740] [SQL] Change the default behavior of First/Last to RESPECT NULLS. [SPARK-9740] [SPARK-9592] [SQL] Change the default behavior of First/Last to RESPECT NULLS. Aug 12, 2015

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai Both if and else branches are executing the same code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. Will fix it.

@ggupta81

Copy link
Copy Markdown

Should i close my original pull request now?
#7928

On Wed, Aug 12, 2015 at 10:38 AM, UCB AMPLab notifications@github.com
wrote:

Merged build started.


Reply to this email directly or view it on GitHub
#8113 (comment).

_Gaurav Gupta_Engineering Manager @ Adobe
9971666780

@yhuai

yhuai commented Aug 12, 2015

Copy link
Copy Markdown
Contributor Author

@ggupta81 Yeah. We can close #7928. Can you open a pr to fix branch 1.4, or reopen #7233 and update that? Thanks!

@SparkQA

SparkQA commented Aug 12, 2015

Copy link
Copy Markdown

Test build #40589 has finished for PR 8113 at commit 15c9e31.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class First(child: Expression, ignoreNulls: Boolean) extends AlgebraicAggregate
    • case class Last(child: Expression, ignoreNulls: Boolean) extends AlgebraicAggregate
    • case class First(
    • case class FirstFunction(
    • case class Last(
    • case class LastFunction(

@SparkQA

SparkQA commented Aug 12, 2015

Copy link
Copy Markdown

Test build #1488 has finished for PR 8113 at commit 2d93d89.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Or(valueSet, Not(IsNull(child))) is a bit shorter. It is a matter of preference though...

@hvanhovell

Copy link
Copy Markdown
Contributor

One more small thing. We should probably also add the ignoreNulls option to the first and last dataframe functions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Or(valueSet.left. valueSet.right) is shorter.

@yhuai

yhuai commented Aug 13, 2015

Copy link
Copy Markdown
Contributor Author

@hvanhovell Since we already pass the feature freeze deadline, I will not add new interfaces to DataFrame API (this PR is mainly about fixing the default behavior). If users request adding DF functions that expose ignoreNulls flag, we can do it in 1.6.

@SparkQA

SparkQA commented Aug 13, 2015

Copy link
Copy Markdown

Test build #40705 has finished for PR 8113 at commit ad0ac67.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class First(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class Last(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class First(
    • case class FirstFunction(
    • case class Last(
    • case class LastFunction(

@yhuai yhuai changed the title [SPARK-9740] [SPARK-9592] [SQL] Change the default behavior of First/Last to RESPECT NULLS. [SPARK-9740] [SPARK-9592] [SPARK-9210] [SQL] Change the default behavior of First/Last to RESPECT NULLS. Aug 13, 2015
@SparkQA

SparkQA commented Aug 13, 2015

Copy link
Copy Markdown

Test build #40724 has finished for PR 8113 at commit f828bdf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class First(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class Last(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class First(
    • case class FirstFunction(
    • case class Last(
    • case class LastFunction(

@SparkQA

SparkQA commented Aug 13, 2015

Copy link
Copy Markdown

Test build #1542 has finished for PR 8113 at commit f828bdf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * Set thresholds in multiclass (or binary) classification to adjust the probability of
    • case class First(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class Last(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class First(
    • case class FirstFunction(
    • case class Last(
    • case class LastFunction(

@SparkQA

SparkQA commented Aug 13, 2015

Copy link
Copy Markdown

Test build #1551 timed out for PR 8113 at commit f828bdf after a configured wait of 175m.

asfgit pushed a commit that referenced this pull request Aug 17, 2015
…pression1.

https://issues.apache.org/jira/browse/SPARK-9592

#8113 has the fundamental fix. But, if we want to minimize the number of changed lines, we can go with this one. Then, in 1.6, we merge #8113.

Author: Yin Huai <yhuai@databricks.com>

Closes #8172 from yhuai/lastFix and squashes the following commits:

b28c42a [Yin Huai] Regression test.
af87086 [Yin Huai] Fix last.

(cherry picked from commit 772e7c1)
Signed-off-by: Michael Armbrust <michael@databricks.com>
asfgit pushed a commit that referenced this pull request Aug 17, 2015
…pression1.

https://issues.apache.org/jira/browse/SPARK-9592

#8113 has the fundamental fix. But, if we want to minimize the number of changed lines, we can go with this one. Then, in 1.6, we merge #8113.

Author: Yin Huai <yhuai@databricks.com>

Closes #8172 from yhuai/lastFix and squashes the following commits:

b28c42a [Yin Huai] Regression test.
af87086 [Yin Huai] Fix last.
Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
@yhuai

yhuai commented Oct 19, 2015

Copy link
Copy Markdown
Contributor Author

test this please

@SparkQA

SparkQA commented Oct 20, 2015

Copy link
Copy Markdown

Test build #43948 has finished for PR 8113 at commit 91c174f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus

Copy link
Copy Markdown
Contributor

Thanks! Merging to master.

@asfgit asfgit closed this in 3afe448 Oct 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants