[SPARK-37965][SQL] Remove check field name when reading/writing existing data in Orc#35253
[SPARK-37965][SQL] Remove check field name when reading/writing existing data in Orc#35253AngersZhuuuu wants to merge 6 commits into
Conversation
|
I checked the history. Seems like we added this check mainly because Parquet restricts the column names that will be removed from #35229. So this change seems fine to me but would be great to double check w/ @dongjoon-hyun |
|
@AngersZhuuuu BTW, I think it would be great to explain why we can remove this change in the PR description with pointing out the commits in the history. |
Yea, will do this later. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Yep, @HyukjinKwon 's comment is correct.
Let's review this after #35229 landed to the master first.
Thank you for keeping Apache Spark data sources consistent.
|
This check is added in #19124 but change to use back quote to wrap field name in #29761 And in pr #29761 added a test
|
…g existing data in Orc" This reverts commit c4fbc9c.
|
thanks, merging to master! |
|
+1, LGTM. |
|
I think we still check for the empty character case, or add the Here are some tests. nativenative writeset spark.sql.orc.impl=native;
create table t_1 stored as orc as select '' ;suceess. native readset spark.sql.orc.impl=native;
select t_1;hive readset spark.sql.orc.impl=hive;
select t_1;hivehive writeset spark.sql.orc.impl=hive;
create table t_1 stored as orc as select '' ;use HiveFileFormatset spark.sql.hive.convertMetastoreOrc=false;
create table t_1 stored as orc as select '' ;org.apache.spark.sql.hive.execution.HiveFileFormat#supportFieldName |
|
@AngersZhuuuu is there a way to only check field name in the write side? |
What changes were proposed in this pull request?
Remove
supportFieldNamecheck in DataSource ORCFormat.Why are the changes needed?
Remove unnecessary check
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added UT