[SPARK-16287][SQL] Implement str_to_map SQL function#13990
Conversation
|
Test build #61525 has finished for PR 13990 at commit
|
|
cc: @cloud-fan @rxin |
|
Test build #61685 has finished for PR 13990 at commit
|
| * Creates a map after splitting the input text into key/value pairs using delimeters | ||
| */ | ||
| @ExpressionDescription( | ||
| usage = """_FUNC_(text[, delimiter1, delimiter2]) - Creates a map after splitting the text into |
There was a problem hiding this comment.
delimiter1 and delimiter2 are not good names. delimiter1 is used to separate key-value pairs from the input text, and delimiter2 is used to separate key and value from each kv pair. Do you have some ideas about the naming?
There was a problem hiding this comment.
how about pairDelim and pairSeperatorDelim, not very good with naming what do you suggest ?
There was a problem hiding this comment.
Used delimiter1 and delimiter2 because its named that way in hive.
There was a problem hiding this comment.
how about pairDelim and keyValueDelim?
There was a problem hiding this comment.
yupp sound much better, let me make the change
| usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into | ||
| key/value pairs using delimiters. | ||
| Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""") | ||
| case class StringToMap(child: Expression, pairDelim: Expression, keyValueDelim: Expression) |
There was a problem hiding this comment.
how about renaming child to text? to make it consistent with the comment: _FUNC_(text[, pairDelim, keyValueDelim])
|
Test build #61690 has finished for PR 13990 at commit
|
|
Test build #61767 has finished for PR 13990 at commit
|
| .split(delim1.asInstanceOf[UTF8String], -1) | ||
| .map{_.split(delim2.asInstanceOf[UTF8String], 2)} | ||
|
|
||
| ArrayBasedMapData(array.map(_(0)), array.map(_(1))).asInstanceOf[MapData] |
There was a problem hiding this comment.
seems unnecessary asInstanceOf?
| * Creates a map after splitting the input text into key/value pairs using delimeters | ||
| */ | ||
| @ExpressionDescription( | ||
| usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into |
There was a problem hiding this comment.
this will mess up the display i think?
There was a problem hiding this comment.
also we really need an example here
There was a problem hiding this comment.
not sure about the display [Usage: str_to_map(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ',' for pairDelim and '=' for keyValueDelim.]
added example
|
cc @dongjoon-hyun can you help review this |
|
Test build #61806 has finished for PR 13990 at commit
|
|
Sure, @rxin . |
|
|
||
| def this(child: Expression) = { | ||
| this(child, Literal(","), Literal("=")) | ||
| } |
There was a problem hiding this comment.
Hi, @techaddict .
Could you add one more constructor, this(child: Expression, pairDelim: Expression)?
| TypeCheckResult.TypeCheckSuccess | ||
| } else { | ||
| TypeCheckResult.TypeCheckFailure( | ||
| s"$prettyName's arguments must be foldable, but got $children.") |
There was a problem hiding this comment.
mistake? 2 delimiters not all arguments
| ) | ||
|
|
||
| // All arguments should be string literals. | ||
| val m1 = intercept[AnalysisException]{ |
There was a problem hiding this comment.
let's remove these error tests from here, usually we only test the type checking logic in unit test, not end-to-end test.
|
Test build #62250 has finished for PR 13990 at commit
|
|
Test build #62256 has finished for PR 13990 at commit
|
|
Test build #62257 has finished for PR 13990 at commit
|
|
@cloud-fan anything else, it good to merge ? |
| TypeCheckResult.TypeCheckSuccess | ||
| } else { | ||
| TypeCheckResult.TypeCheckFailure( | ||
| s"$prettyName's delimiters must be foldable, but got $children.") |
There was a problem hiding this comment.
$children will print something like Seq(xxx, xxx), I think we can just say $prettyName's delimiters must be foldable
|
Sorry I was OOO last few days, LGTM except some minor comments, thanks for working on it! |
|
Test build #62471 has finished for PR 13990 at commit
|
|
@cloud-fan Comment addressed, test passed 👍 |
|
|
||
| override def dataType: DataType = MapType(StringType, StringType, valueContainsNull = false) | ||
|
|
||
| override def checkInputDataTypes(): TypeCheckResult = { |
There was a problem hiding this comment.
looks like it's simpler to follow XPathExtract to do the type check, i.e. implement ExpectsInputTypes to check the type, and override checkInputDataTypes for the foldable check.
|
Test build #62681 has finished for PR 13990 at commit
|
## What changes were proposed in this pull request? This PR adds `str_to_map` SQL function in order to remove Hive fallback. ## How was this patch tested? Pass the Jenkins tests with newly added. Author: Sandeep Singh <sandeep@techaddict.me> Closes #13990 from techaddict/SPARK-16287. (cherry picked from commit df2c6d5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
thanks, merging to master and 2.0! |
|
#14315 fixed the odd compile error for this. Is this really something we should be merging in branch 2.0 now? this looks like part of a new feature, and not even obviously something for 2.0.1. |
|
@srowen please see https://issues.apache.org/jira/browse/SPARK-16275, there is an explanation why we wanna merge them into 2.0 |
What changes were proposed in this pull request?
This PR adds
str_to_mapSQL function in order to remove Hive fallback.How was this patch tested?
Pass the Jenkins tests with newly added.