Skip to content

[Feature] Support Spark expression: subtract_dates #3094

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.

Comet does not currently support the Spark subtract_dates function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

SubtractDates is a binary expression that calculates the difference between two date values. It supports both legacy interval output (CalendarIntervalType) and the newer day-time interval format (DayTimeIntervalType) depending on configuration settings.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

Describe the potential solution

Spark Specification

Syntax:

date1 - date2

Arguments:

Argument Type Description
left Expression The left date operand (minuend)
right Expression The right date operand (subtrahend)
legacyInterval Boolean Flag determining output format (defaults to SQLConf.legacyIntervalEnabled)

Return Type: - CalendarIntervalType: When legacyInterval is true, returns a calendar interval with months=0, days=difference, microseconds=0

  • DayTimeIntervalType(DAY): When legacyInterval is false, returns a day-time interval representing the difference in microseconds

Supported Data Types:

  • Input types: DateType for both left and right operands
  • Implicit casting is supported through ImplicitCastInputTypes trait

Edge Cases:

  • Null handling: Returns null if either input is null (null-intolerant behavior)
  • Overflow behavior: Uses Math.subtractExact() and Math.multiplyExact() which throw ArithmeticException on overflow
  • Empty input: Not applicable as this is a binary expression requiring two operands
  • Negative results: Supported when left date is earlier than right date

Examples:

-- Basic date subtraction
SELECT DATE '2023-01-15' - DATE '2023-01-10' AS diff;

-- Using with table columns
SELECT order_date - ship_date AS processing_days FROM orders;
// DataFrame API usage
import org.apache.spark.sql.functions._

df.select(col("end_date") - col("start_date") as "duration")

// Direct expression construction
val leftExpr = Literal(Date.valueOf("2023-01-15"))
val rightExpr = Literal(Date.valueOf("2023-01-10"))
val subtractExpr = SubtractDates(leftExpr, rightExpr)

Implementation Approach

See the Comet guide on adding new expressions for detailed instructions.

  1. Scala Serde: Add expression handler in spark/src/main/scala/org/apache/comet/serde/
  2. Register: Add to appropriate map in QueryPlanSerde.scala
  3. Protobuf: Add message type in native/proto/src/proto/expr.proto if needed
  4. Rust: Implement in native/spark-expr/src/ (check if DataFusion has built-in support first)

Additional context

Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.SubtractDates

Related:

  • AddMonths: For adding months to dates
  • DateDiff: Alternative date difference calculation
  • CalendarIntervalType: Legacy interval data type
  • DayTimeIntervalType: Modern interval data type

This issue was auto-generated from Spark reference documentation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions