Skip to content

[SPARK-29295][SQL][FOLLOWUP] Dynamic partition map parsed from partition path should be case insensitive#28765

Closed
turboFei wants to merge 2 commits into
apache:masterfrom
turboFei:SPARK-29295-follow-up
Closed

[SPARK-29295][SQL][FOLLOWUP] Dynamic partition map parsed from partition path should be case insensitive#28765
turboFei wants to merge 2 commits into
apache:masterfrom
turboFei:SPARK-29295-follow-up

Conversation

@turboFei

@turboFei turboFei commented Jun 9, 2020

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This is a follow up of #25979.
When we inserting overwrite an external hive partitioned table with upper case dynamic partition key, exception thrown.

like:

org.apache.spark.SparkException: Dynamic partition key P1 is not among written partition paths.

The root cause is that Hive metastore is not case preserving and keeps partition columns with lower cased names, see details in:

// Hive metastore is not case preserving and keeps partition columns with lower cased names,
// and Hive will validate the column names in partition spec to make sure they are partition
// columns. Here we Lowercase the column names before passing the partition spec to Hive
// client, to satisfy Hive.
// scalastyle:off caselocale
orderedPartitionSpec.put(colName.toLowerCase, partition(colName))
// scalastyle:on caselocale

val updatedPartitionSpec = partition.map {
case (key, Some(value)) => key -> value
case (key, None) if dpMap.contains(key) => key -> dpMap(key)
case (key, _) =>
throw new SparkException(s"Dynamic partition key $key is not among " +
"written partition paths.")
}

In this PR, we convert the dynamic partition map to a case insensitive map.

Why are the changes needed?

To fix the issue when inserting overwrite into external hive partitioned table with upper case dynamic partition key.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT.

@turboFei turboFei changed the title [SPARK-29295][FOLLOWUP] Dynamic partition map should be case insensitive. [SPARK-29295][SQL][FOLLOWUP] Dynamic partition map should be case insensitive. Jun 9, 2020
@turboFei

turboFei commented Jun 9, 2020

Copy link
Copy Markdown
Member Author

cc @viirya @cloud-fan

@turboFei turboFei changed the title [SPARK-29295][SQL][FOLLOWUP] Dynamic partition map should be case insensitive. [SPARK-29295][SQL][FOLLOWUP] Dynamic partition map parsed from partition path should be case insensitive Jun 9, 2020
@turboFei

turboFei commented Jun 9, 2020

Copy link
Copy Markdown
Member Author

thanks, have added a blank line

@cloud-fan

Copy link
Copy Markdown
Contributor

ok to test

@SparkQA

SparkQA commented Jun 9, 2020

Copy link
Copy Markdown

Test build #123685 has finished for PR 28765 at commit d949065.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan

Copy link
Copy Markdown
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 717ec5e Jun 9, 2020
cloud-fan pushed a commit that referenced this pull request Jun 9, 2020
…ion path should be case insensitive

### What changes were proposed in this pull request?

This is a follow up of #25979.
When we inserting overwrite  an external hive partitioned table with upper case dynamic partition key, exception thrown.

like:
```
org.apache.spark.SparkException: Dynamic partition key P1 is not among written partition paths.
```
The root cause is that Hive metastore is not case preserving and keeps partition columns with lower cased names, see details in:

https://github.com/apache/spark/blob/ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L895-L901
https://github.com/apache/spark/blob/e28914095aa1fa7a4680b5e4fcf69e3ef64b3dbc/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala#L228-L234

In this PR, we convert the dynamic partition map to a case insensitive map.
### Why are the changes needed?

To fix the issue when inserting overwrite into external hive partitioned table with upper case dynamic partition key.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
UT.

Closes #28765 from turboFei/SPARK-29295-follow-up.

Authored-by: turbofei <fwang12@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 717ec5e)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants