We've encountered a scenario where iceberg throws an IllegalArgumentException when filtering a dataframe against a column that is a BinaryType.
Here is the stacktrace
ava.lang.IllegalArgumentException: Cannot convert bytes to SQL literal: java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
at org.apache.iceberg.spark.Spark3Util$DescribeExpressionVisitor.sqlString(Spark3Util.java:654)
at org.apache.iceberg.spark.Spark3Util$DescribeExpressionVisitor.predicate(Spark3Util.java:628)
at org.apache.iceberg.spark.Spark3Util$DescribeExpressionVisitor.predicate(Spark3Util.java:576)
at org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:308)
here
In order to reproduce the issue, you can simply clone this repo https://github.com/cccs-br/spark-iceberg-issue
git@github.com:cccs-br/spark-iceberg-issue.git
The test case reproducing the exception is here:
https://github.com/cccs-br/spark-iceberg-issue/blob/f7811250608df8618fe264cbf340b5d40effba0d/src/test/java/IcebergTests.java#L59
You will see in this class three more test cases that show very similar filtering but are successful.
Out of the 4 test cases in this class, only one fails, where you would expect all of them to succeed.
The anomaly here is that the filtering works when using a literal expression, but fails when using:
Java
spark.sql("select * from my_table").where(col("binary_column_name").$greater(my_byte_array))
Scala
spark.sql("select * from my_table").where(col("binary_column_name") > my_byte_array)
Two of the tests are basically reproducing the two cases. The difference being that the DataFrame is created by reading the parquet files generated by iceberg directly. This is just to show that both filtering methods work in those instances.
As far as I can tell, this should be working. But I could be wrong. If so, can you advise?
We're using Spark-3.1.2 with
<dependency>
<groupId>org.apache.iceberg</groupId>
<artifactId>iceberg-spark3-runtime</artifactId>
<version>0.11.1</version>
<scope>tests</scope>
</dependency>
We've encountered a scenario where iceberg throws an IllegalArgumentException when filtering a dataframe against a column that is a BinaryType.
Here is the stacktrace
here
In order to reproduce the issue, you can simply clone this repo https://github.com/cccs-br/spark-iceberg-issue
The test case reproducing the exception is here:
https://github.com/cccs-br/spark-iceberg-issue/blob/f7811250608df8618fe264cbf340b5d40effba0d/src/test/java/IcebergTests.java#L59
You will see in this class three more test cases that show very similar filtering but are successful.
Out of the 4 test cases in this class, only one fails, where you would expect all of them to succeed.
The anomaly here is that the filtering works when using a literal expression, but fails when using:
Java
Scala
Two of the tests are basically reproducing the two cases. The difference being that the DataFrame is created by reading the parquet files generated by iceberg directly. This is just to show that both filtering methods work in those instances.
As far as I can tell, this should be working. But I could be wrong. If so, can you advise?
We're using Spark-3.1.2 with