[SPARK-32889][SQL] orc table column name supports special characters.#29761
[SPARK-32889][SQL] orc table column name supports special characters.#29761jzc928 wants to merge 3 commits into
Conversation
|
test this please |
|
also cc @dongjoon-hyun @cloud-fan |
|
Test build #128719 has finished for PR 29761 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Since we support special column names in data source already, I believe this PR is okay. I left a few comments, @jzc928 .
scala> Seq(1, 2).toDF("$").write.orc("/tmp/orc")
scala> spark.read.orc("/tmp/orc").printSchema
root
|-- $: integer (nullable = true)
scala> sc.version
res3: String = 3.0.1
b378671 to
c3c7f4c
Compare
|
Retest this please. |
|
@jzc928 . I left a few comments. Please update the PR accordingly. Although this is different from Parquet, but this is the same with JSON data source. So, I think we can accept this approach after revising the PR and passing Jenkins CI tests. |
|
@dongjoon-hyun comments fixed. |
|
Test build #128797 has finished for PR 29761 at commit
|
|
Retest this please. |
|
Test build #128828 has finished for PR 29761 at commit
|
|
Thank you for your first contribution, @jzc928 . |
What changes were proposed in this pull request?
make orc table column name support special characters like
$Why are the changes needed?
Special characters like
$are allowed in orc table column name by Hive.But it's error when execute command "CREATE TABLE tbl(
$INT, b INT) using orc" in spark. it's not compatible with Hive.Column name "$" contains invalid character(s). Please use alias to rename it.;Column name "$" contains invalid character(s). Please use alias to rename it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid character(s). Please use alias to rename it.; at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)Does this PR introduce any user-facing change?
No
How was this patch tested?
Add unit test