What is the problem the feature request solves?
Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark time_diff function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.
The TimeDiff expression calculates the difference between two time-based values (timestamps, dates, or time intervals) in a specified unit. It is a ternary expression that takes a unit string and two temporal expressions, returning the numeric difference as a long integer.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
TIME_DIFF(unit, start_time, end_time)
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(expr("time_diff('SECOND', start_col, end_col)"))
Arguments:
| Argument |
Type |
Description |
| unit |
StringType |
The unit of time difference to calculate (e.g., 'SECOND', 'MINUTE', 'HOUR', 'DAY') |
| start |
AnyTimeType |
The starting timestamp, date, or time value |
| end |
AnyTimeType |
The ending timestamp, date, or time value |
Return Type: LongType - Returns a long integer representing the time difference in the specified unit.
Supported Data Types:
- unit: String type with collation support (trim collation supported)
- start: Any time-related type (TimestampType, DateType, etc.)
- end: Any time-related type (TimestampType, DateType, etc.)
Edge Cases:
- Null handling: Returns null if any of the three arguments (unit, start, end) is null
- Invalid unit: Throws runtime exception for unrecognized time unit strings
- Type mismatch: Implicit casting is applied to make start and end compatible time types
- Overflow: May overflow for extremely large time differences that exceed Long.MAX_VALUE
- Timezone handling: Results depend on session timezone settings for timestamp calculations
Examples:
-- Calculate difference in seconds
SELECT TIME_DIFF('SECOND', '2023-01-01 10:00:00', '2023-01-01 10:05:30') AS diff_seconds;
-- Returns: 330
-- Calculate difference in days
SELECT TIME_DIFF('DAY', '2023-01-01', '2023-01-15') AS diff_days;
-- Returns: 14
// DataFrame API usage
import org.apache.spark.sql.functions._
val df = spark.sql("SELECT '2023-01-01 10:00:00' as start_time, '2023-01-01 12:00:00' as end_time")
df.select(expr("time_diff('HOUR', start_time, end_time)").alias("hour_diff")).show()
// Using with column references
df.withColumn("minute_diff", expr("time_diff('MINUTE', start_time, end_time)"))
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
- Scala Serde: Add expression handler in
spark/src/main/scala/org/apache/comet/serde/
- Register: Add to appropriate map in
QueryPlanSerde.scala
- Protobuf: Add message type in
native/proto/src/proto/expr.proto if needed
- Rust: Implement in
native/spark-expr/src/ (check if DataFusion has built-in support first)
Additional context
Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.TimeDiff
Related:
DateDiff - For date-only differences
DateAdd - For adding time intervals
DateSub - For subtracting time intervals
Extract - For extracting specific time components
This issue was auto-generated from Spark reference documentation.
What is the problem the feature request solves?
Comet does not currently support the Spark
time_difffunction, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.The
TimeDiffexpression calculates the difference between two time-based values (timestamps, dates, or time intervals) in a specified unit. It is a ternary expression that takes a unit string and two temporal expressions, returning the numeric difference as a long integer.Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
Arguments:
Return Type:
LongType- Returns a long integer representing the time difference in the specified unit.Supported Data Types:
Edge Cases:
Examples:
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scalanative/proto/src/proto/expr.protoif needednative/spark-expr/src/(check if DataFusion has built-in support first)Additional context
Difficulty: Medium
Spark Expression Class:
org.apache.spark.sql.catalyst.expressions.TimeDiffRelated:
DateDiff- For date-only differencesDateAdd- For adding time intervalsDateSub- For subtracting time intervalsExtract- For extracting specific time componentsThis issue was auto-generated from Spark reference documentation.