Skip to content

[SPARK-47413][SQL] - add support to substr/left/right for collations #45738

Closed
GideonPotok wants to merge 7 commits into
apache:masterfrom
GideonPotok:spark_collation_47413
Closed

[SPARK-47413][SQL] - add support to substr/left/right for collations #45738
GideonPotok wants to merge 7 commits into
apache:masterfrom
GideonPotok:spark_collation_47413

Conversation

@GideonPotok

@GideonPotok GideonPotok commented Mar 27, 2024

Copy link
Copy Markdown
Contributor

https://issues.apache.org/jira/browse/SPARK-46830

What changes were proposed in this pull request?

Add collation support to types of return values for calls to substr, left, right, when passed in arguments of an explicit, implicit, or session-specified collations. Add tests to validate behavior.

Why are the changes needed?

We are incrementally adding collation support to built-in string functions in Spark. These functions are intended to be supported for collated types.

Does this PR introduce any user-facing change?

these sql functions will now not throw errors when passed in collated types. Instead, they will return the right value, of the passed in type. Or of the default collation.

How was this patch tested?

Unit testing + ad-hoc spark shell and pyspark shell interactions.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions Bot added the SQL label Mar 27, 2024
@GideonPotok

Copy link
Copy Markdown
Contributor Author

@uros-db

I have some tests, and have substring implementation basically passing tests.
However, my redefined implementations of Left and Right are failing my tests. They are currently throwing the following exceptions in my new test cases in CollationSuite.scala:

[COMPLEX_EXPRESSION_UNSUPPORTED_INPUT.MISMATCHED_TYPES] Cannot process input data types for the expression: "(IF((1 <= 0), , substring(collate(klm), (- 1), 2147483647)))". All input types must be the same except nullable, containsNull, valueContainsNull flags, but found the input types ["STRING", "STRING COLLATE UTF8_BINARY_LCASE"]. SQLSTATE: 42K09

I think it is caused by additional work needed on the overridden implementation of replacement within Left and Right in stringExpressions.scala. I think that because the second parameter to Literal is DataType in the replacement, rather than AbstractDataType, there is this little issue. I think that is what is causing these tests to fail. What do you think? Please advise on how to debug this issue.

@uros-db

uros-db commented Apr 1, 2024

Copy link
Copy Markdown
Contributor

since the "COMPLEX_EXPRESSION_UNSUPPORTED_INPUT.MISMATCHED_TYPES" problem seems to be in Right.replacement expression when doing Literal(value, dataType), did you try something like Literal(UTF8String.EMPTY_UTF8, str.dataType) to match these datatypes?

@GideonPotok

Copy link
Copy Markdown
Contributor Author

@uros-db I think that did the trick! I will let you know when to review again. Thanks!

@GideonPotok GideonPotok marked this pull request as ready for review April 3, 2024 14:23
@GideonPotok

Copy link
Copy Markdown
Contributor Author

@uros-db what do you think of it? Anything else you want me to test for? I also did some ad-hoc testing of these functions in spark shell and pyspark shell, all looks good..

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/
         
Using Scala version 2.13.13 (OpenJDK 64-Bit Server VM, Java 17.0.10)
>....                                                                                                                                                                                                                                                                                                              
     |  import org.apache.spark.sql._
     |  import org.apache.spark.sql.functions._
     |  import org.apache.spark.unsafe.types.UTF8String
     |  val df = spark.range(1000 * 1000).toDF("id")
     | 
     |  val dff = df.withColumn("random_string", lit(UTF8String.fromString(Random.nextString(25))))
     |  dff.show(1, 200, true)
     |  val dfff = dff.withColumn("my_substring", substring(col("random_string"), 5, 5))
     |  .withColumn("my_leftstring", left(col("random_string"), lit(5)))
     |  .withColumn("my_rightstring", right(col("random_string"), lit(5)))
     |     .withColumn("my_concat", concat(col("my_substring"), col("my_leftstring"), col("my_rightstring")))
     | 
     |     dfff.show(1, 50, true)
-RECORD 0----------------------------------------------------
 id            | 0                                           
 random_string | ᯻箲䨂䡶栣䌳벣ሰ‸ꠂ昊룃쉳믠䭐沂各䬙j穯೑顆첤鞅ల 
only showing top 1 rows

-RECORD 0-----------------------------------------------------
 id             | 0                                           
 random_string  | ᯻箲䨂䡶栣䌳벣ሰ‸ꠂ昊룃쉳믠䭐沂各䬙j穯೑顆첤鞅ల 
 my_substring   | 栣䌳벣ሰ‸                                    
 my_leftstring  | ᯻箲䨂䡶栣                                   
 my_rightstring | ೑顆첤鞅ల                                    
 my_concat      | 栣䌳벣ሰ‸᯻箲䨂䡶栣೑顆첤鞅ల                                
only showing top 1 rows

import scala.util.Random
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.unsafe.types.UTF8String
val df: org.apache.spark.sql.DataFrame = [id: bigint]
val dff: org.apache.spark.sql.DataFrame = [id: bigint, random_string: string]
val dfff: org.apache.spark.sql.DataFrame = [id: bigint, random_string: string ... 4 more fields]

scala>    dfff.show(1, 50, true)
     |     dfff.registerTempTable("mytable")
     |     spark.sql("select * from mytable").show(1)
     |     spark.sqlContext.setConf("spark.sql.collation.enabled", "true")
     |     spark.sql("select COLLATION( random_string), COLLATION( my_substring), COLLATION( my_leftstring), COLLATION( my_rightstring), COLLATION( my_concat) from mytable").show(1)
warning: 1 deprecation (since 2.0.0); for details, enable `:setting -deprecation` or `:replay -deprecation`
-RECORD 0-----------------------------------------------------
 id             | 0                                           
 random_string  | ᯻箲䨂䡶栣䌳벣ሰ‸ꠂ昊룃쉳믠䭐沂各䬙j穯೑顆첤鞅ల 
 my_substring   | 栣䌳벣ሰ‸                                    
 my_leftstring  | ᯻箲䨂䡶栣                                   
 my_rightstring | ೑顆첤鞅ల                                    
 my_concat      | 栣䌳벣ሰ‸᯻箲䨂䡶栣೑顆첤鞅ల                                    
only showing top 1 rows

+---+---------------------------------+------------+-------------+--------------+-------------------------+
| id|                    random_string|my_substring|my_leftstring|my_rightstring|                my_concat|
+---+---------------------------------+------------+-------------+--------------+-------------------------+
|  0|᯻箲䨂䡶栣䌳벣ሰ‸ꠂ昊룃쉳믠䭐沂各...|    栣䌳벣ሰ‸|    ᯻箲䨂䡶栣|      ೑顆첤鞅ల|栣䌳벣ሰ‸᯻箲䨂䡶栣೑顆첤鞅ల|
+---+---------------------------------+------------+-------------+--------------+-------------------------+
only showing top 1 rows

+------------------------+-----------------------+------------------------+-------------------------+--------------------+
|collation(random_string)|collation(my_substring)|collation(my_leftstring)|collation(my_rightstring)|collation(my_concat)|
+------------------------+-----------------------+------------------------+-------------------------+--------------------+
|             UTF8_BINARY|            UTF8_BINARY|             UTF8_BINARY|              UTF8_BINARY|         UTF8_BINARY|
+------------------------+-----------------------+------------------------+-------------------------+--------------------+
only showing top 1 rows


scala>     spark.sql("select collate(my_concat, 'unicode') as unicode_myconcat, collate(my_concat, 'utf8_binary') as utf8_binary_myconcat, collate(my_concat, 'utf8_binary_lcase') as utf8_binary_lcase_myconcat, collate(my_concat, 'unicode_ci') as unicode_ci_myconcat from mytable").show(1)
     | 
+-------------------------+-------------------------+--------------------------+-------------------------+
|         unicode_myconcat|     utf8_binary_myconcat|utf8_binary_lcase_myconcat|      unicode_ci_myconcat|
+-------------------------+-------------------------+--------------------------+-------------------------+
|栣䌳벣ሰ‸᯻箲䨂䡶栣೑顆첤鞅ల|栣䌳벣ሰ‸᯻箲䨂䡶栣೑顆첤鞅ల| 栣䌳벣ሰ‸᯻箲䨂䡶栣೑顆첤鞅ల|栣䌳벣ሰ‸᯻箲䨂䡶栣೑顆첤鞅ల|
+-------------------------+-------------------------+--------------------------+-------------------------+
only showing top 1 rows


scala>     spark.sql("select left(collate(my_concat, 'unicode'), 5) as unicode_myconcat, right(collate(my_concat, 'utf8_binary'), 5) as utf8_binary_myconcat, substr(collate(my_concat, 'utf8_binary_lcase'), 5, 5) as utf8_binary_lcase_myconcat, left(collate(my_concat, 'unicode_ci'), 1) as unicode_ci_myconcat
 from mytable").show(1)
     | 
+----------------+--------------------+--------------------------+-------------------+
|unicode_myconcat|utf8_binary_myconcat|utf8_binary_lcase_myconcat|unicode_ci_myconcat|
+----------------+--------------------+--------------------------+-------------------+
|        栣䌳벣ሰ‸|            ೑顆첤鞅ల|                  ‸᯻箲䨂䡶|                 栣|
+----------------+--------------------+--------------------------+-------------------+
only showing top 1 rows


scala>      spark.sql("select COLLATION(left(collate(my_concat, 'unicode'), 5)) as unicode_myconcat, COLLATION(right(collate(my_concat, 'utf8_binary'), 5)) as utf8_binary_myconcat, COLLATION(substr(collate(my_concat, 'utf8_binary_lcase'), 5, 5)) as utf8_binary_lcase_myconcat, COLLATION(left(collate(my_conc
at, 'unicode_ci'), 1)) as unicode_ci_myconcat from mytable").show(1)
     | 
+----------------+--------------------+--------------------------+-------------------+
|unicode_myconcat|utf8_binary_myconcat|utf8_binary_lcase_myconcat|unicode_ci_myconcat|
+----------------+--------------------+--------------------------+-------------------+
|         UNICODE|         UTF8_BINARY|         UTF8_BINARY_LCASE|         UNICODE_CI|
+----------------+--------------------+--------------------------+-------------------+
only showing top 1 rows


scala>     spark.sql("select COLLATION(left(collate(my_concat, 'unicode'), 5)) as unicode_myconcat, COLLATION(right(collate(my_concat, 'utf8_binary'), 5)) as utf8_binary_myconcat, COLLATION(substr(collate(my_concat, 'utf8_binary_lcase'), 5, 5)) as utf8_binary_lcase_myconcat, COLLATION(left(collate(my_conca
t, 'unicode_ci'), 1)) as unicode_ci_myconcat from mytable").write 
val res5: org.apache.spark.sql.DataFrameWriter[org.apache.spark.sql.Row] = org.apache.spark.sql.DataFrameWriter@6dd31bab

scala>     spark.sql("select left(collate(my_concat, 'unicode'), 5) as unicode_myconcat, right(collate(my_concat, 'utf8_binary'), 5) as utf8_binary_myconcat, substr(collate(my_concat, 'utf8_binary_lcase'), 5, 5) as utf8_binary_lcase_myconcat, left(collate(my_concat, 'unicode_ci'), 1) as unicode_ci_myconcat
 from mytable")
     |     .write.mode("overwrite").parquet("mytable.parquet")

24/04/02 10:10:24 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory 
scala> 

scala>     val readdf = spark.read.parquet("mytable.parquet")
     | 
val readdf: org.apache.spark.sql.DataFrame = [unicode_myconcat: string collate UNICODE, utf8_binary_myconcat: string ... 2 more fields]

scala>     readdf.show(1, 50, true)
     | 
-RECORD 0------------------------------
 unicode_myconcat           | 栣䌳벣ሰ‸ 
 utf8_binary_myconcat       | ೑顆첤鞅ల 
 utf8_binary_lcase_myconcat | ‸᯻箲䨂䡶 
 unicode_ci_myconcat        | 栣    
only showing top 1 rows


scala>     readdf.createOrReplaceTempView("mytable2")
     | 

scala>         spark.sql("select COLLATION(unicode_myconcat), COLLATION(utf8_binary_myconcat), COLLATION(utf8_binary_lcase_myconcat), COLLATION(unicode_ci_myconcat) from mytable2").show(1)
     | 
+---------------------------+-------------------------------+-------------------------------------+------------------------------+
|collation(unicode_myconcat)|collation(utf8_binary_myconcat)|collation(utf8_binary_lcase_myconcat)|collation(unicode_ci_myconcat)|
+---------------------------+-------------------------------+-------------------------------------+------------------------------+
|                    UNICODE|                    UTF8_BINARY|                    UTF8_BINARY_LCASE|                    UNICODE_CI|
+---------------------------+-------------------------------+-------------------------------------+------------------------------+
only showing top 1 rows


scala> 
:quit
gideon@Gideon's MacBook Pro spark % ./bin/pyspark 
Python 3.9.6 (default, Feb  3 2024, 15:58:27) 
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
>>> df = spark.range(1000 * 1000).toDF("id")
>>> import pyspark.sql.functions as F
>>> import random
>>> import string
>>> dff = df.withColumn("random_string", F.lit("".join([random.choice(string.ascii_letters) for _ in range(25)])))
>>> dff.show(1, 100, True)
-RECORD 0----------------------------------
 id            | 0                         
 random_string | fWuOlqjSthNLbHQUuxGizXuKX 
only showing top 1 rows

>>> dfff = dff.withColumn("my_substring", F.substring(F.col("random_string"), 5, 5)) \
...     .withColumn("my_leftstring", F.left(F.col("random_string"),  F.lit(5))) \
...     .withColumn("my_rightstring", F.right(F.col("random_string"), F.lit(5))) \
...     .withColumn("my_concat", F.concat(F.col("my_substring"), F.col("my_leftstring"), F.col("my_rightstring")))
>>> 
>>> dfff.show(10, 100, True)
-RECORD 0-----------------------------------
 id             | 0                         
 random_string  | fWuOlqjSthNLbHQUuxGizXuKX 
 my_substring   | lqjSt                     
 my_leftstring  | fWuOl                     
 my_rightstring | zXuKX                     
 my_concat      | lqjStfWuOlzXuKX           
only showing top 1 rows

>>> dfff.createOrReplaceTempView("mytable")
>>> spark.sql("select * from mytable").show(1)
+---+--------------------+------------+-------------+--------------+---------------+
| id|       random_string|my_substring|my_leftstring|my_rightstring|      my_concat|
+---+--------------------+------------+-------------+--------------+---------------+
|  0|fWuOlqjSthNLbHQUu...|       lqjSt|        fWuOl|         zXuKX|lqjStfWuOlzXuKX|
+---+--------------------+------------+-------------+--------------+---------------+
only showing top 1 rows

>>> spark.sql("select COLLATION( random_string), COLLATION( my_substring), COLLATION( my_leftstring), COLLATION( my_rightstring), COLLATION( my_concat) from mytable").show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/gideon/repos/spark/python/pyspark/sql/session.py", line 1711, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
  File "/Users/gideon/repos/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/Users/gideon/repos/spark/python/pyspark/errors/exceptions/captured.py", line 221, in deco
    raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: [UNSUPPORTED_FEATURE.COLLATION] The feature is not supported: Collation is not yet supported. SQLSTATE: 0A000;
Project [collation(random_string#4) AS collation(random_string)#107, collation(my_substring#29) AS collation(my_substring)#108, collation(my_leftstring#33) AS collation(my_leftstring)#109, collation(my_rightstring#38) AS collation(my_rightstring)#110, collation(my_concat#44) AS collation(my_concat)#111]
+- SubqueryAlias mytable
   +- View (`mytable`, [id#2L, random_string#4, my_substring#29, my_leftstring#33, my_rightstring#38, my_concat#44])
      +- Project [id#2L, random_string#4, my_substring#29, my_leftstring#33, my_rightstring#38, concat(my_substring#29, my_leftstring#33, my_rightstring#38) AS my_concat#44]
         +- Project [id#2L, random_string#4, my_substring#29, my_leftstring#33, right(random_string#4, 5) AS my_rightstring#38]
            +- Project [id#2L, random_string#4, my_substring#29, left(random_string#4, 5) AS my_leftstring#33]
               +- Project [id#2L, random_string#4, substring(random_string#4, 5, 5) AS my_substring#29]
                  +- Project [id#2L, fWuOlqjSthNLbHQUuxGizXuKX AS random_string#4]
                     +- Project [id#0L AS id#2L]
                        +- Range (0, 1000000, step=1, splits=Some(16))    
>>> spark.conf.set("spark.sql.collation.enabled", "true")
>>> spark.sql("select COLLATION( random_string), COLLATION( my_substring), COLLATION( my_leftstring), COLLATION( my_rightstring), COLLATION( my_concat) from mytable").show(1)
+------------------------+-----------------------+------------------------+-------------------------+--------------------+
|collation(random_string)|collation(my_substring)|collation(my_leftstring)|collation(my_rightstring)|collation(my_concat)|
+------------------------+-----------------------+------------------------+-------------------------+--------------------+
|             UTF8_BINARY|            UTF8_BINARY|             UTF8_BINARY|              UTF8_BINARY|         UTF8_BINARY|
+------------------------+-----------------------+------------------------+-------------------------+--------------------+
only showing top 1 rows

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try covering some more edge cases in these tests, such as: empty strings, uppercase and lowercase mix, different byte-length chars, etc.

good example would be this PR: #45749

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uros-db Please see what I have since added and let me know whether what to keep going, or whether we have enough!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will do it, thanks! Now just move over all tests to CollationStringExpressionsSuite, re-run any failing CI checks, and modify the PR title so we can open this up for review to Spark committers

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @uros-db, I didn't realize you wanted me to add the additional edge cases to this suite specifically. Are you sure you want me to move all the new tests to CollationStringExpressionsSuite?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's put all these tests that are specific to string functions into the separate suite CollationStringExpressionsSuite

this way, CollationSuite won't get cluttered as we continue adding more and more support for various string expressions

@GideonPotok GideonPotok force-pushed the spark_collation_47413 branch from 9a6f2b9 to 0108768 Compare April 5, 2024 14:28
@GideonPotok GideonPotok requested a review from uros-db April 5, 2024 16:51
@GideonPotok GideonPotok force-pushed the spark_collation_47413 branch from a065adf to 63385ed Compare April 5, 2024 17:41
@GideonPotok GideonPotok changed the title [SPARK-47413][SQL] [WIP - DONT REVIEW] - add support to substr for collations [SPARK-47413][SQL] [WIP - DONT REVIEW] - add support to substr/left/right for collations Apr 5, 2024
@GideonPotok GideonPotok force-pushed the spark_collation_47413 branch from 9b49d7d to 5270c0a Compare April 8, 2024 06:11
@GideonPotok GideonPotok changed the title [SPARK-47413][SQL] [WIP - DONT REVIEW] - add support to substr/left/right for collations [SPARK-47413][SQL] - add support to substr/left/right for collations Apr 8, 2024
@GideonPotok GideonPotok force-pushed the spark_collation_47413 branch from 67ff2da to f25e8a9 Compare April 9, 2024 13:30
@GideonPotok

Copy link
Copy Markdown
Contributor Author

@uros-db this would be ready for review but spark sql unit tests repeatedly fail for CSVLegacyTimeParserSuite, which I did not modify. I have reran it a total of four times. Any idea how to get this check to pass successfully?

@uros-db

uros-db commented Apr 10, 2024

Copy link
Copy Markdown
Contributor

did you try to run this suite locally and investigate any potential issues?

@GideonPotok

GideonPotok commented Apr 10, 2024

Copy link
Copy Markdown
Contributor Author

did you try to run this suite locally and investigate any potential issues?

@uros-db I would love to. I need to fix my setup though, first -- there is an issue I have been putting off where any unit tests (in the spark codebase) that make use of the local file system seems to hang indefinitely when run locally: any tests that use .noop write, that use CREATE TABLE... USING PARQUET, or when running this suite in particular (org.apache.spark.sql.execution.datasources.csv.CSVLegacyTimeParserSuite). I would love to know how you recommend that I look into figuring out the root cause, so I can fix my local setup...

I usually run from sbt directly. I can provide additional setup details, eg my jvmopts and sbt jvm opts. I have also done some cursory JVM performance profiling and can provide some of those measurements. I can also let you know what I have tried (The thing I was most optimistic about was chmod go+rw /Users/gideon/repos/spark/target/tmp)

Let me know what to look into first. Thanks so much.

@uros-db

uros-db commented Apr 10, 2024

Copy link
Copy Markdown
Contributor

I don't think CSVLegacyTimeParserSuite is related to you, but it would probably be a very good idea to setup Maven so that you can run/debug all tests locally in general

@GideonPotok

Copy link
Copy Markdown
Contributor Author

I don't think CSVLegacyTimeParserSuite is related to you, but it would probably be a very good idea to setup Maven so that you can run/debug all tests locally in general

Same issue running tests locally with maven. I always build with maven and when running tests with it (Eg build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.execution.datasources.csv.CSVLegacyTimeParserSuite test), i encounter the same hanging on anything that is writing to file system. I can keep looking into it.

I agree that my changes are probably not what is causing org.apache.spark.sql.execution.datasources.csv.CSVLegacyTimeParserSuite to fail in GHA.

@GideonPotok

Copy link
Copy Markdown
Contributor Author

@uros-db I got the file-writing tests to work locally when I simply export SPARK_HOME=/Users/gideon/repos/spark prior to running my maven tests.

More importantly, All GHA are passing. So this is ready for reviewers to begin to review.

@GideonPotok

Copy link
Copy Markdown
Contributor Author

@uros-db this is ready for review.

@uros-db

uros-db commented Apr 11, 2024

Copy link
Copy Markdown
Contributor

@GideonPotok nice work, thanks!

Heads up though: we will soon be finishing some code refactoring related to collation-aware string expression support (SPARK-47410), and will likely need to rewrite this PRs a bit in order to comply with some new design before proceeding to final reivew

let's put this PR on hold for now, and I'll ping you when we're ready to move on

@GideonPotok

Copy link
Copy Markdown
Contributor Author

@uros-db No problem at all.

if I understand your refactor correctly, my changes will basically either stay in the same place or move to the new common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java, right?

I will in the meantime take care of implementing https://issues.apache.org/jira/browse/SPARK-47412 and will also then put that on hold until after the refactor is merged. If that sounds good?

@GideonPotok

Copy link
Copy Markdown
Contributor Author

PS: Do you think changes, such as these, which are only to implementations of inputTypes and replacement, which do not rely on calling UTFString or CollationFactory, will need to be modified? Even if they won't, it makes sense for me to hold off. But I would still like to know how you see this changing.

Thank you, kind sir :)

@uros-db

uros-db commented Apr 11, 2024

Copy link
Copy Markdown
Contributor

@GideonPotok You are correct, this refactor should not greatly affect your current PR in particular - I expect you'll only need to refactor testing a bit (shouldn't be too much work)

Feel free to move over to the next ticket, and I'll let you know when and how to consolidate your code/design so we can move to final review and sign off on this PR

@uros-db uros-db left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we’ve done some major code restructuring in #45978, so please sync these changes before moving on

@GideonPotok you’ll likely need to rewrite the code in this PR a bit, so please follow the guidelines outlined in https://issues.apache.org/jira/browse/SPARK-47410

@GideonPotok

GideonPotok commented Apr 14, 2024

Copy link
Copy Markdown
Contributor Author

@uros-db superseding this with #46040

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants