Skip to content

[SPARK-32889][SQL] orc table column name supports special characters.#29761

Closed
jzc928 wants to merge 3 commits into
apache:masterfrom
jzc928:orcColSpecialChar
Closed

[SPARK-32889][SQL] orc table column name supports special characters.#29761
jzc928 wants to merge 3 commits into
apache:masterfrom
jzc928:orcColSpecialChar

Conversation

@jzc928

@jzc928 jzc928 commented Sep 15, 2020

Copy link
Copy Markdown

What changes were proposed in this pull request?

make orc table column name support special characters like $

Why are the changes needed?

Special characters like $ are allowed in orc table column name by Hive.
But it's error when execute command "CREATE TABLE tbl($ INT, b INT) using orc" in spark. it's not compatible with Hive.

Column name "$" contains invalid character(s). Please use alias to rename it.;Column name "$" contains invalid character(s). Please use alias to rename it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid character(s). Please use alias to rename it.; at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add unit test

@wzhfy

wzhfy commented Sep 15, 2020

Copy link
Copy Markdown
Contributor

test this please

Comment thread sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala Outdated
Comment thread sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala Outdated
Comment thread sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala Outdated
Comment thread sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala Outdated
@wzhfy

wzhfy commented Sep 15, 2020

Copy link
Copy Markdown
Contributor

also cc @dongjoon-hyun @cloud-fan

Comment thread sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala Outdated
@SparkQA

SparkQA commented Sep 15, 2020

Copy link
Copy Markdown

Test build #128719 has finished for PR 29761 at commit 0a2cca7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment thread sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala Outdated

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we support special column names in data source already, I believe this PR is okay. I left a few comments, @jzc928 .

scala> Seq(1, 2).toDF("$").write.orc("/tmp/orc")

scala> spark.read.orc("/tmp/orc").printSchema
root
 |-- $: integer (nullable = true)


scala> sc.version
res3: String = 3.0.1

Comment thread sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala Outdated
Comment thread sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala Outdated
Comment thread sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala Outdated
@dongjoon-hyun

Copy link
Copy Markdown
Member

Retest this please.

@dongjoon-hyun

dongjoon-hyun commented Sep 17, 2020

Copy link
Copy Markdown
Member

@jzc928 . I left a few comments. Please update the PR accordingly. Although this is different from Parquet, but this is the same with JSON data source. So, I think we can accept this approach after revising the PR and passing Jenkins CI tests.

@jzc928

jzc928 commented Sep 17, 2020

Copy link
Copy Markdown
Author

@dongjoon-hyun comments fixed.

@SparkQA

SparkQA commented Sep 17, 2020

Copy link
Copy Markdown

Test build #128797 has finished for PR 29761 at commit c3c7f4c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun

Copy link
Copy Markdown
Member

Retest this please.

@SparkQA

SparkQA commented Sep 17, 2020

Copy link
Copy Markdown

Test build #128828 has finished for PR 29761 at commit 7d63fd9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @jzc928 and @wzhfy .
Merged to master for Apache Spark 3.1.0 on December 2020.

@dongjoon-hyun

Copy link
Copy Markdown
Member

Thank you for your first contribution, @jzc928 .
You are added to the Apache Spark contributor group and SPARK-32889 is assigned to you.
Welcome to the Apache Spark community!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants