Skip to content

[SPARK-47099][SQL] Use ordinalNumber to uniformly set the value of paramIndex for the error class UNEXPECTED_INPUT_TYPE#45177

Closed
panbingkun wants to merge 7 commits into
apache:masterfrom
panbingkun:SPARK-47099
Closed

[SPARK-47099][SQL] Use ordinalNumber to uniformly set the value of paramIndex for the error class UNEXPECTED_INPUT_TYPE#45177
panbingkun wants to merge 7 commits into
apache:masterfrom
panbingkun:SPARK-47099

Conversation

@panbingkun

@panbingkun panbingkun commented Feb 20, 2024

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

The pr aims to use ordinalNumber to uniformly set the value of paramIndex for the error class UNEXPECTED_INPUT_TYPE.

Why are the changes needed?

When I was reviewing the spark code, I found that:

We should unify it and avoid understanding differences.

Does this PR introduce any user-facing change?

Yes, the value of 'paramIndex' for the error class UNEXPECTED-INPUT-TYPE is uniformly set by ordinalNumber.

How was this patch tested?

  • Updated existed UT.
  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

…ass `UNEXPECTED_INPUT_TYPE` should be `1`1
@panbingkun panbingkun changed the title [SPARK-47099][SQL] The start value of paramIndex for the error class UNEXPECTED_INPUT_TYPE should be 1 [WIP][SPARK-47099][SQL] The start value of paramIndex for the error class UNEXPECTED_INPUT_TYPE should be 1 Feb 20, 2024
@github-actions github-actions Bot added the SQL label Feb 20, 2024
@panbingkun panbingkun changed the title [WIP][SPARK-47099][SQL] The start value of paramIndex for the error class UNEXPECTED_INPUT_TYPE should be 1 [SPARK-47099][SQL] The start value of paramIndex for the error class UNEXPECTED_INPUT_TYPE should be 1 Feb 20, 2024
@panbingkun panbingkun marked this pull request as ready for review February 20, 2024 06:25
@panbingkun

Copy link
Copy Markdown
Contributor Author

cc @cloud-fan @MaxGekk

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should unify it and avoid understanding differences.

I agree we should unify. BTW, did we document somewhere that param indexes are 1-based?

@cloud-fan

Copy link
Copy Markdown
Contributor

Shall we make the error message more explicit?

The <paramIndex>th parameter requires the <requiredType> type, ...

or something more accurate in English. cc @srielau

@panbingkun

Copy link
Copy Markdown
Contributor Author

We should unify it and avoid understanding differences.

I agree we should unify. BTW, did we document somewhere that param indexes are 1-based?

How about this file?
https://github.com/apache/spark/blob/f0f35c8b1c8f3b1d7f7c2b79945eb29ffb3c8f9a/common/utils/src/main/resources/error/README.md?plain=1

Add the following content at the end of this file?

## Extra
1.The parameter paramIndex of error class  `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE` is 1-based.

@panbingkun

Copy link
Copy Markdown
Contributor Author

Shall we make the error message more explicit?

The <paramIndex>th parameter requires the <requiredType> type, ...

or something more accurate in English. cc @srielau

How about the following error message?

The <paramIndex>-th parameter of function <functionName> requires the <requiredType> type, however <inputSql> has the type <inputType>.

I found a reference:
https://github.com/alibaba/sentinel-golang/blob/08071855bc67423777353c6e02f1390b419172b1/core/hotspot/rule.go#L79-L80
image

@MaxGekk

MaxGekk commented Feb 20, 2024

Copy link
Copy Markdown
Member

This <paramIndex>th seems oddly: 1th instead of 1st, 2th instead of 2nd.

How about to reuse the function:

def ordinalNumber(i: Int): String = i match {
case 0 => "first"
case 1 => "second"
case 2 => "third"
case i => s"${i + 1}th"

in the error message.

@panbingkun

Copy link
Copy Markdown
Contributor Author

This <paramIndex>th seems oddly: 1th instead of 1st, 2th instead of 2nd.

How about to reuse the function:

def ordinalNumber(i: Int): String = i match {
case 0 => "first"
case 1 => "second"
case 2 => "third"
case i => s"${i + 1}th"

in the error message.

Okay, I think it's great, let me do it.

@panbingkun panbingkun changed the title [SPARK-47099][SQL] The start value of paramIndex for the error class UNEXPECTED_INPUT_TYPE should be 1 [SPARK-47099][SQL] Use ordinalNumber to uniformly set the value of paramIndex for the error class UNEXPECTED_INPUT_TYPE Feb 21, 2024
…`paramIndex` for the error class `UNEXPECTED_INPUT_TYPE`
@github-actions github-actions Bot added the DOCS label Feb 21, 2024
@panbingkun

Copy link
Copy Markdown
Contributor Author

Considering that we have already used ordinalNumber to uniformly set the value of paramIndex for the error class UNEXPECTED_INPUT_TYPE, it is more reasonable for us to use 0-based here.


## Extra

- Use `ordinalNumber` to uniformly set the value of `paramIndex`(0-based) for the error class `UNEXPECTED_INPUT_TYPE`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this? now people will see first, second, ... nth, etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let me remove it.

parameters = Map(
"sqlExpr" -> "\"bitmap_count(a)\"",
"paramIndex" -> "0",
"paramIndex" -> "1",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be first?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, two were missed and have been fixed.

parameters = Map(
"sqlExpr" -> "\"bitmap_or_agg(a)\"",
"paramIndex" -> "0",
"paramIndex" -> "1",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@MaxGekk

MaxGekk commented Feb 21, 2024

Copy link
Copy Markdown
Member

+1, LGTM. Merging to master.
Thank you, @panbingkun and @cloud-fan for review.

@MaxGekk MaxGekk closed this in 9b53b80 Feb 21, 2024
HyukjinKwon added a commit that referenced this pull request Feb 23, 2024
…TYPE

### What changes were proposed in this pull request?

This PR is a followup of #45177 that fixes some leftovers missed.

### Why are the changes needed?

For consistency. Also, I think this fixes the Maven build failure: https://github.com/apache/spark/actions/runs/8005710953/job/21865798408

### Does this PR introduce _any_ user-facing change?

Yes, the value of 'paramIndex' for the error class `UNEXPECTED-INPUT-TYPE` is uniformly set by `ordinalNumber`.

### How was this patch tested?

CI in this PR.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45225 from HyukjinKwon/SPARK-47099-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @panbingkun and all.

This seems to break Java 21 environment too.

This PR missed the following while updating try_arithmetic.sql.out.

  • sql/core/src/test/resources/sql-tests/results/try_arithmetic.sql.out.java21

Here is a follow-up.

dongjoon-hyun added a commit that referenced this pull request Feb 23, 2024
### What changes were proposed in this pull request?

This is a follow-up of #45177

### Why are the changes needed?

To recover Java 21 Daily CI.
- https://github.com/apache/spark/actions/workflows/build_maven_java21.yml
  - https://github.com/apache/spark/actions/runs/8020316098/job/21917446965

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually with Java 21.

```
$ build/sbt "sql/testOnly *.SQLQueryTestSuite -- -z try_arithmetic"
Using /Users/dongjoon/.jenv/versions/21 as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Using SPARK_LOCAL_IP=localhost
[info] welcome to sbt 1.9.3 (Apple Inc. Java 21.0.2)
...
[info] SQLQueryTestSuite:
12:33:19.091 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] - ansi/try_arithmetic.sql (1 second, 37 milliseconds)
[info] - ansi/try_arithmetic.sql_analyzer_test (114 milliseconds)
[info] - try_arithmetic.sql (688 milliseconds)
[info] - try_arithmetic.sql_analyzer_test (93 milliseconds)
12:33:24.980 WARN org.apache.spark.sql.SQLQueryTestSuite:
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 75794
Total time: 0.335704411 seconds

Rule                                                                                  Effective Time / Total Time                     Effective Runs / Total Runs

org.apache.spark.sql.catalyst.optimizer.ConstantFolding                               59501877 / 60217667                             108 / 336
org.apache.spark.sql.catalyst.analysis.An...
12:33:24.981 WARN org.apache.spark.sql.SQLQueryTestSuite:
=== Metrics of Whole-stage Codegen ===
Total code generation time: 0.051986919 seconds
Total compile time: 0.116905665 seconds

12:33:25.012 WARN org.apache.spark.sql.SQLQueryTestSuite:

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.SQLQueryTestSuite, threads: QueryStageCreator-0 (daemon=true), Idle Worker Monitor for python3 (daemon=true), rpc-boss-3-1 (daemon=true), process reaper (pid 15500) (daemon=true), ForkJoinPool.commonPool-worker-3 (daemon=true), ForkJoinPool.commonPool-worker-2 (daemon=true), shuffle-boss-6-1 (daemon=true), ForkJoinPool.commonPool-worker-1 (daemon=true) =====
[info] Run completed in 8 seconds, 79 milliseconds.
[info] Total number of tests run: 4
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 4, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 118 s (01:58), completed Feb 23, 2024, 12:33:25 PM
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45235 from dongjoon-hyun/SPARK-47099.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
…`paramIndex` for the error class `UNEXPECTED_INPUT_TYPE`

### What changes were proposed in this pull request?
The pr aims to use `ordinalNumber` to `uniformly` set the value of `paramIndex` for the error class `UNEXPECTED_INPUT_TYPE`.

### Why are the changes needed?
When I was reviewing the spark code, I found that:
- Some expressions may have a starting value of 1 when throwing error `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE`, eg:
https://github.com/apache/spark/blob/b0aad59f123581b66515c864873f46ea4ec4e762/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L300-L302

- Some are 0, eg:
https://github.com/apache/spark/blob/b0aad59f123581b66515c864873f46ea4ec4e762/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/bitmapExpressions.scala#L117-L119

**We should `unify` it and avoid understanding differences.**

### Does this PR introduce _any_ user-facing change?
Yes, the value of 'paramIndex' for the error class `UNEXPECTED-INPUT-TYPE` is uniformly set by `ordinalNumber`.

### How was this patch tested?
- Updated existed UT.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#45177 from panbingkun/SPARK-47099.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
…TYPE

### What changes were proposed in this pull request?

This PR is a followup of apache#45177 that fixes some leftovers missed.

### Why are the changes needed?

For consistency. Also, I think this fixes the Maven build failure: https://github.com/apache/spark/actions/runs/8005710953/job/21865798408

### Does this PR introduce _any_ user-facing change?

Yes, the value of 'paramIndex' for the error class `UNEXPECTED-INPUT-TYPE` is uniformly set by `ordinalNumber`.

### How was this patch tested?

CI in this PR.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45225 from HyukjinKwon/SPARK-47099-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
### What changes were proposed in this pull request?

This is a follow-up of apache#45177

### Why are the changes needed?

To recover Java 21 Daily CI.
- https://github.com/apache/spark/actions/workflows/build_maven_java21.yml
  - https://github.com/apache/spark/actions/runs/8020316098/job/21917446965

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually with Java 21.

```
$ build/sbt "sql/testOnly *.SQLQueryTestSuite -- -z try_arithmetic"
Using /Users/dongjoon/.jenv/versions/21 as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Using SPARK_LOCAL_IP=localhost
[info] welcome to sbt 1.9.3 (Apple Inc. Java 21.0.2)
...
[info] SQLQueryTestSuite:
12:33:19.091 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] - ansi/try_arithmetic.sql (1 second, 37 milliseconds)
[info] - ansi/try_arithmetic.sql_analyzer_test (114 milliseconds)
[info] - try_arithmetic.sql (688 milliseconds)
[info] - try_arithmetic.sql_analyzer_test (93 milliseconds)
12:33:24.980 WARN org.apache.spark.sql.SQLQueryTestSuite:
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 75794
Total time: 0.335704411 seconds

Rule                                                                                  Effective Time / Total Time                     Effective Runs / Total Runs

org.apache.spark.sql.catalyst.optimizer.ConstantFolding                               59501877 / 60217667                             108 / 336
org.apache.spark.sql.catalyst.analysis.An...
12:33:24.981 WARN org.apache.spark.sql.SQLQueryTestSuite:
=== Metrics of Whole-stage Codegen ===
Total code generation time: 0.051986919 seconds
Total compile time: 0.116905665 seconds

12:33:25.012 WARN org.apache.spark.sql.SQLQueryTestSuite:

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.SQLQueryTestSuite, threads: QueryStageCreator-0 (daemon=true), Idle Worker Monitor for python3 (daemon=true), rpc-boss-3-1 (daemon=true), process reaper (pid 15500) (daemon=true), ForkJoinPool.commonPool-worker-3 (daemon=true), ForkJoinPool.commonPool-worker-2 (daemon=true), shuffle-boss-6-1 (daemon=true), ForkJoinPool.commonPool-worker-1 (daemon=true) =====
[info] Run completed in 8 seconds, 79 milliseconds.
[info] Total number of tests run: 4
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 4, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 118 s (01:58), completed Feb 23, 2024, 12:33:25 PM
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45235 from dongjoon-hyun/SPARK-47099.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants