Skip to content

IllegarlArgumentException when filtering on BinaryType column. #2934

Description

@cccs-br

We've encountered a scenario where iceberg throws an IllegalArgumentException when filtering a dataframe against a column that is a BinaryType.

Here is the stacktrace

ava.lang.IllegalArgumentException: Cannot convert bytes to SQL literal: java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]

	at org.apache.iceberg.spark.Spark3Util$DescribeExpressionVisitor.sqlString(Spark3Util.java:654)
	at org.apache.iceberg.spark.Spark3Util$DescribeExpressionVisitor.predicate(Spark3Util.java:628)
	at org.apache.iceberg.spark.Spark3Util$DescribeExpressionVisitor.predicate(Spark3Util.java:576)
	at org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:308)

here

In order to reproduce the issue, you can simply clone this repo https://github.com/cccs-br/spark-iceberg-issue

git@github.com:cccs-br/spark-iceberg-issue.git

The test case reproducing the exception is here:

https://github.com/cccs-br/spark-iceberg-issue/blob/f7811250608df8618fe264cbf340b5d40effba0d/src/test/java/IcebergTests.java#L59

You will see in this class three more test cases that show very similar filtering but are successful.

Out of the 4 test cases in this class, only one fails, where you would expect all of them to succeed.

The anomaly here is that the filtering works when using a literal expression, but fails when using:

Java

spark.sql("select * from my_table").where(col("binary_column_name").$greater(my_byte_array))

Scala

spark.sql("select * from my_table").where(col("binary_column_name") > my_byte_array)

Two of the tests are basically reproducing the two cases. The difference being that the DataFrame is created by reading the parquet files generated by iceberg directly. This is just to show that both filtering methods work in those instances.

As far as I can tell, this should be working. But I could be wrong. If so, can you advise?

We're using Spark-3.1.2 with

        <dependency>
            <groupId>org.apache.iceberg</groupId>
            <artifactId>iceberg-spark3-runtime</artifactId>
            <version>0.11.1</version>
            <scope>tests</scope>
        </dependency>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions