Skip to content

SPARK-9210 corrects aggregate function name in exception message#7557

Closed
ssimeonov wants to merge 1 commit into
apache:masterfrom
swoop-inc:SPARK-9210
Closed

SPARK-9210 corrects aggregate function name in exception message#7557
ssimeonov wants to merge 1 commit into
apache:masterfrom
swoop-inc:SPARK-9210

Conversation

@ssimeonov

Copy link
Copy Markdown
Contributor

@AmplabJenkins

Copy link
Copy Markdown

Can one of the admins verify this patch?

@srowen

srowen commented Jul 21, 2015

Copy link
Copy Markdown
Member

CC @marmbrus just in case

@marmbrus

Copy link
Copy Markdown
Contributor

I don't think this is correct: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L147

scala> sql("SELECT a FROM test GROUP BY b")
org.apache.spark.sql.AnalysisException: expression 'a' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.;

scala> sql("SELECT first(a) FROM test GROUP BY b")
res3: org.apache.spark.sql.DataFrame = [_c0: int]

@ssimeonov

Copy link
Copy Markdown
Contributor Author

@marmbrus can you please provide a complete example that can execute in spark-shell?

You can find a standalone runnable example with complete shell output in this gist. Here is the summary of what happens:

// ERROR RetryingHMSHandler: MetaException(message:NoSuchObjectException(message:Function default.first does not exist))
// INFO FunctionRegistry: Unable to lookup UDF in metastore: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:NoSuchObjectException(message:Function default.first does not exist))
// java.lang.RuntimeException: Couldn't find function first
ctx.sql("select first(num) from test_first group by category").show

// OK
ctx.sql("select first_value(num) from test_first group by category").show

Perhaps the difference is that I'm using HiveContext?

@marmbrus

Copy link
Copy Markdown
Contributor

Which version of hive/spark are you running?

@ssimeonov

Copy link
Copy Markdown
Contributor Author

@marmbrus you can see the version and full INFO-level shell output in the gist. I'm running 1.4.1.

@rxin

rxin commented Aug 12, 2015

Copy link
Copy Markdown
Contributor

cc @yhuai since you are working on a related issue.

@yhuai

yhuai commented Aug 12, 2015

Copy link
Copy Markdown
Contributor

In Spark 1.4, first and last were not in function registry. Right now, first_value and last_value are pointing to Hive's first_value and last_value, respectively. I am adding these to our function registry as well in #8113. @ssimeonov I will update the exception message in #8113.

@ssimeonov

Copy link
Copy Markdown
Contributor Author

@Yhual great

@JoshRosen

Copy link
Copy Markdown
Contributor

Hey @ssimeonov, would you mind closing this PR now that it's change has been incorporated into #8113? Thanks!

@asfgit asfgit closed this in 8d4449c Oct 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants