Skip to content

[SPARK-47567][SQL] Support LOCATE function to work with collated strings#45791

Closed
miland-db wants to merge 32 commits into
apache:masterfrom
miland-db:miland-db/string-locate
Closed

[SPARK-47567][SQL] Support LOCATE function to work with collated strings#45791
miland-db wants to merge 32 commits into
apache:masterfrom
miland-db:miland-db/string-locate

Conversation

@miland-db

@miland-db miland-db commented Apr 1, 2024

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Extend built-in string functions to support non-binary, non-lowercase collation for: locate

Why are the changes needed?

Update collation support for built-in string functions in Spark.

Does this PR introduce any user-facing change?

Yes, users should now be able to use COLLATE within arguments for built-in string function LOCATE in Spark SQL queries, using non-binary collations such as UNICODE_CI.

How was this patch tested?

Unit tests for queries using StringLocate (CollationStringExpressionsSuite.scala).

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions Bot added the SQL label Apr 1, 2024
@miland-db

Copy link
Copy Markdown
Contributor Author

Adding @uros-db

Comment thread common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java Outdated
@miland-db

Copy link
Copy Markdown
Contributor Author

Adding Belgrade collation crew: @stefankandic @mihailom-db @nikolamand-db @dbatomic
Adding: @cloud-fan & @MaxGekk

Comment thread common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java Outdated
@miland-db miland-db requested a review from uros-db April 4, 2024 08:51
@uros-db

uros-db commented Apr 11, 2024

Copy link
Copy Markdown
Contributor

heads up: we’ve done some major code restructuring in #45978, so please sync these changes before moving on

@miland-db you’ll likely need to rewrite the code in this PR, so please follow the guidelines outlined in https://issues.apache.org/jira/browse/SPARK-47410

# Conflicts:
#	common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
#	sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
# Conflicts:
#	sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
#	sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
# Conflicts:
#	common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java

@uros-db uros-db left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just flagging this PR will likely need a fix for the ICU implementation

# Conflicts:
#	common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
#	common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
#	sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
@miland-db miland-db requested review from cloud-fan and uros-db April 26, 2024 14:06
# Conflicts:
#	common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
#	common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java

@uros-db uros-db left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, @cloud-fan please review

@cloud-fan

Copy link
Copy Markdown
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 7b1147a Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants