What is the problem the feature request solves?
Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark subtract_dates function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.
SubtractDates is a binary expression that calculates the difference between two date values. It supports both legacy interval output (CalendarIntervalType) and the newer day-time interval format (DayTimeIntervalType) depending on configuration settings.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
Arguments:
| Argument |
Type |
Description |
| left |
Expression |
The left date operand (minuend) |
| right |
Expression |
The right date operand (subtrahend) |
| legacyInterval |
Boolean |
Flag determining output format (defaults to SQLConf.legacyIntervalEnabled) |
Return Type: - CalendarIntervalType: When legacyInterval is true, returns a calendar interval with months=0, days=difference, microseconds=0
- DayTimeIntervalType(DAY): When
legacyInterval is false, returns a day-time interval representing the difference in microseconds
Supported Data Types:
- Input types: DateType for both left and right operands
- Implicit casting is supported through ImplicitCastInputTypes trait
Edge Cases:
- Null handling: Returns null if either input is null (null-intolerant behavior)
- Overflow behavior: Uses
Math.subtractExact() and Math.multiplyExact() which throw ArithmeticException on overflow
- Empty input: Not applicable as this is a binary expression requiring two operands
- Negative results: Supported when left date is earlier than right date
Examples:
-- Basic date subtraction
SELECT DATE '2023-01-15' - DATE '2023-01-10' AS diff;
-- Using with table columns
SELECT order_date - ship_date AS processing_days FROM orders;
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(col("end_date") - col("start_date") as "duration")
// Direct expression construction
val leftExpr = Literal(Date.valueOf("2023-01-15"))
val rightExpr = Literal(Date.valueOf("2023-01-10"))
val subtractExpr = SubtractDates(leftExpr, rightExpr)
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
- Scala Serde: Add expression handler in
spark/src/main/scala/org/apache/comet/serde/
- Register: Add to appropriate map in
QueryPlanSerde.scala
- Protobuf: Add message type in
native/proto/src/proto/expr.proto if needed
- Rust: Implement in
native/spark-expr/src/ (check if DataFusion has built-in support first)
Additional context
Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.SubtractDates
Related:
- AddMonths: For adding months to dates
- DateDiff: Alternative date difference calculation
- CalendarIntervalType: Legacy interval data type
- DayTimeIntervalType: Modern interval data type
This issue was auto-generated from Spark reference documentation.
What is the problem the feature request solves?
Comet does not currently support the Spark
subtract_datesfunction, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.SubtractDates is a binary expression that calculates the difference between two date values. It supports both legacy interval output (CalendarIntervalType) and the newer day-time interval format (DayTimeIntervalType) depending on configuration settings.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
date1 - date2Arguments:
Return Type: - CalendarIntervalType: When
legacyIntervalis true, returns a calendar interval with months=0, days=difference, microseconds=0legacyIntervalis false, returns a day-time interval representing the difference in microsecondsSupported Data Types:
Edge Cases:
Math.subtractExact()andMath.multiplyExact()which throw ArithmeticException on overflowExamples:
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scalanative/proto/src/proto/expr.protoif needednative/spark-expr/src/(check if DataFusion has built-in support first)Additional context
Difficulty: Medium
Spark Expression Class:
org.apache.spark.sql.catalyst.expressions.SubtractDatesRelated:
This issue was auto-generated from Spark reference documentation.