[SPARK-33453][SQL][TESTS] Unify v1 and v2 SHOW PARTITIONS tests by MaxGekk · Pull Request #30377 · apache/spark

MaxGekk · 2020-11-14T18:04:23Z

What changes were proposed in this pull request?

Move SHOW PARTITIONS parsing tests to ShowPartitionsParserSuite
Place Hive tests for SHOW PARTITIONS from HiveCommandSuite to the base test suite v1.ShowPartitionsSuiteBase. This will allow to run the tests w/ and w/o Hive.

The changes follow the approach of #30287.

Why are the changes needed?

The unification will allow to run common SHOW PARTITIONS tests for both DSv1 and Hive DSv1, DSv2
We can detect missing features and differences between DSv1 and DSv2 implementations.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running:

new test suites build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowPartitionsSuite"
and old one build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly org.apache.spark.sql.hive.execution.HiveCommandSuite"

MaxGekk · 2020-11-14T18:05:08Z

@cloud-fan @HyukjinKwon May I ask you to take a look at this PR, please.

SparkQA · 2020-11-14T18:52:40Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35698/

SparkQA · 2020-11-14T19:22:37Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35698/

janekdb · 2020-11-14T22:29:41Z

+    }
+  }
+
+  test("show partitions of not partitioned table") {


"non-partitioned" sounds a bit more natural.

Thank you. I will address your comment together with others. @HyukjinKwon @cloud-fan Do you have any comments for this PR?

There are a few places with not partitioned:

$ find . -name '*.scala' -print0|xargs -0 grep -i -n 'not partitioned' ./core/src/test/scala/org/apache/spark/rdd/SortingSuite.scala:138: test("get a range of elements in an array not partitioned by a range partitioner") { ./mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala:925: * Note that the term "rating block" is a bit of a misnomer, as the ratings are not partitioned by ./streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala:134: // If the RDD is not partitioned the right way, let us repartition it using the ./sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:556: // One side of join is not partitioned in the desired way. Need to shuffle one side. ./sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:590: // One side of join is not partitioned in the desired way. Since the number of partitions of ./sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:591: // the side that has already partitioned is smaller than the side that is not partitioned, ./sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala:308: test("OverwritePartitions: overwrite all rows if not partitioned") { ./sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala:130: test("show partitions of not partitioned table") { ./sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala:137: assert(errMsg.contains("not allowed on a table that is not partitioned")) ./sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala:1982: // not supported since the table is not partitioned ./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileIndex.scala:73: /** Schema of the partitioning columns, or the empty schema if the table is not partitioned. */ ./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala:79: rows += toCatalystRow("Not partitioned", "", "") ./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala:533: failAnalysis(s"Insert into a partition is not allowed because $l is not partitioned.") ./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala:149: // This dataset is not partitioned. ./sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:462: s"for tables that are not partitioned: $tableIdentWithDB") ./sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:989: * 1. If the table is not partitioned. ./sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:999: s"SHOW PARTITIONS is not allowed on a table that is not partitioned: $tableIdentWithDB") ./sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala:1755: // not supported since the table is not partitioned

I will change the title of this test but I am not sure about other places. I will leave them AS IS so far. @janekdb If you want, you can open a PR and fix them when it makes sense.

SparkQA · 2020-11-14T22:34:13Z

Test build #131095 has finished for PR 30377 at commit 8fe17a0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-11-16T08:32:31Z

+        .partitionBy("a")
+        .format("parquet")
+        .mode(SaveMode.Overwrite)
+        .saveAsTable("part_datasrc")


this seems like testing the DataFrameWriter API not the SHOW PARTITIONS command.

ah the test was already there. Let's keep it then.

cloud-fan · 2020-11-16T08:34:57Z

+
+  override protected def createDateTable(table: String): Unit = {
+    sql(s"""
+      |CREATE TABLE $table (price int, qty int)


CREATE TABLE ... USING hive PARTITIONED BY (...) doesn't work?

Let me check that. I just didn't want to change the original test.

I removed the functions from Hive's suite.

SparkQA · 2020-11-16T09:36:04Z

Test build #131148 has finished for PR 30377 at commit 47925f2.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

…partitions-tests

SparkQA · 2020-11-16T11:01:09Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35755/

SparkQA · 2020-11-16T11:24:56Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35755/

SparkQA · 2020-11-16T15:24:55Z

Test build #131152 has finished for PR 30377 at commit 59f2b38.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-11-16T15:33:09Z

thanks, merging to master!

MaxGekk added 14 commits November 13, 2020 19:33

Create ShowPartitionsParserSuite

29e5bae

Move tests to ShowPartitionsParserSuite

d7bf651

Add v1/v2 ShowPartitionsSuite

c23048e

Move a view test

87e86b5

Add hive.execution.command.ShowPartitionsSuite

851929b

Move tests from HiveCommandSuite

82432c8

Move "filter by partitions" to v1 ShowPartitionsSuite

2351b64

Move the test "show partitions from a datasource"

cc89024

de-dup code

a9bcdbb

Move the test "non-partitioning columns"

38d3c67

Fix "show partitions of not partitioned table"

f86f159

Move the test "show partitions of a view"

76e6399

Fix v1/ShowPartitionsSuite

8707d3e

Add TODO

8fe17a0

github-actions Bot added the SQL label Nov 14, 2020

janekdb reviewed Nov 14, 2020

View reviewed changes

cloud-fan reviewed Nov 16, 2020

View reviewed changes

MaxGekk added 2 commits November 16, 2020 12:13

not partitioned -> non-partitioned

cd04107

Don't override table creation in Hive: USING HIVE

47925f2

MaxGekk added 2 commits November 16, 2020 12:55

Merge remote-tracking branch 'origin/master' into unify-dsv1_v2-show-…

e3cd5e1

…partitions-tests

Fix ShowPartitionsParserSuite

59f2b38

cloud-fan approved these changes Nov 16, 2020

View reviewed changes

cloud-fan closed this in 6883f29 Nov 16, 2020

MaxGekk mentioned this pull request Nov 25, 2020

[WIP][SPARK-33558][SQL][TESTS] Unify v1 and v2 ALTER TABLE .. PARTITION tests #30499

Closed

MaxGekk deleted the unify-dsv1_v2-show-partitions-tests branch February 19, 2021 15:03

Uh oh!

Conversation

MaxGekk commented Nov 14, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MaxGekk commented Nov 14, 2020

Uh oh!

SparkQA commented Nov 14, 2020

Uh oh!

SparkQA commented Nov 14, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 14, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 16, 2020

Uh oh!

SparkQA commented Nov 16, 2020

Uh oh!

SparkQA commented Nov 16, 2020

Uh oh!

SparkQA commented Nov 16, 2020

Uh oh!

cloud-fan commented Nov 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants