Skip to content

[SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion#25581

Closed
gengliangwang wants to merge 6 commits into
apache:masterfrom
gengliangwang:ANSImode
Closed

[SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion#25581
gengliangwang wants to merge 6 commits into
apache:masterfrom
gengliangwang:ANSImode

Conversation

@gengliangwang

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Introduce ANSI store assignment policy for table insertion.
With ANSI policy, Spark performs the type coercion of table insertion as per ANSI SQL.

Why are the changes needed?

In Spark version 2.4 and earlier, when inserting into a table, Spark will cast the data type of input query to the data type of target table by coercion. This can be super confusing, e.g. users make a mistake and write string values to an int column.

In data source V2, by default, only upcasting is allowed when inserting data into a table. E.g. int -> long and int -> string are allowed, while decimal -> double or long -> int are not allowed. The rules of UpCast was originally created for Dataset type coercion. They are quite strict and different from the behavior of all existing popular DBMS. This is breaking change. It is possible that existing queries are broken after 3.0 releases.

Following ANSI SQL standard makes Spark consistent with the table insertion behaviors of popular DBMS like PostgreSQL/Oracle/Mysql.

Does this PR introduce any user-facing change?

A new optional mode for table insertion.

How was this patch tested?

Unit test

@SparkQA

SparkQA commented Aug 26, 2019

Copy link
Copy Markdown

Test build #109731 has finished for PR 25581 at commit 39c331e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala Outdated
byName: Boolean,
resolver: Resolver,
context: String,
storeAssignmentPolicy: StoreAssignmentPolicy.Value,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pass in a boolean flag isStrict?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping the original policy is also fine. Otherwise, it is hard to tell that we are using ANSI mode if isStrict is false.

@SparkQA

SparkQA commented Aug 26, 2019

Copy link
Copy Markdown

Test build #109738 has finished for PR 25581 at commit e2b3754.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class StrictDataTypeWriteCompatibilitySuite extends DataTypeWriteCompatibilityBaseSuite
  • class ANSIDataTypeWriteCompatibilitySuite extends DataTypeWriteCompatibilityBaseSuite
  • abstract class DataTypeWriteCompatibilityBaseSuite extends SparkFunSuite

@SparkQA

SparkQA commented Aug 26, 2019

Copy link
Copy Markdown

Test build #109739 has finished for PR 25581 at commit 68da9cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Aug 26, 2019

Copy link
Copy Markdown

Test build #109743 has finished for PR 25581 at commit fcc68dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


protected def storeAssignmentPolicy: StoreAssignmentPolicy.Value

def assertAllowed(

@cloud-fan cloud-fan Aug 27, 2019

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks to me that, we can make the code simpler if this method takes the policy parameter. e.g.

test("Check atomic types: write allowed only when casting is safe") {
  atomicTypes.foreach { w =>
    atomicTypes.foreach { r =>
      if (Cast.canUpCast(w, r)) {
        assertAllowed(Policy.STRICT, ...)
      } else {
        assertSingleError(Policy.STRICT, ...)
      }
      if (Cast.canANSIStoreAssign(w, r)) ...
    } 
  }
}

@SparkQA

SparkQA commented Aug 27, 2019

Copy link
Copy Markdown

Test build #109808 has finished for PR 25581 at commit af0f739.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan

Copy link
Copy Markdown
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 2b24a71 Aug 27, 2019
cloud-fan pushed a commit that referenced this pull request Aug 29, 2019
…and long in ASNI mode

### What changes were proposed in this pull request?

Disallow conversions between `timestamp` type and `long` type in table insertion with ANSI store assignment policy.

### Why are the changes needed?

In the PR #25581, timestamp type is allowed to be converted to long type, since timestamp type is represented by long type internally, and both legacy mode and strict mode allows the conversion.

After reconsideration, I think we should disallow it. As per ANSI SQL section "4.4.2 Characteristics of numbers":
> A number is assignable only to sites of numeric type.

In PostgreSQL, the conversion between timestamp and long is also disallowed.

### Does this PR introduce any user-facing change?

Conversion between timestamp and long is disallowed in table insertion with ANSI store assignment policy.

### How was this patch tested?

Unit test

Closes #25615 from gengliangwang/disallowTimeStampToLong.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants