Skip to content

[Feature] Support Spark expression: timestamp_add #3113

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.

Comet does not currently support the Spark timestamp_add function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

The TimestampAdd expression adds a specified quantity of time units to a timestamp value. It supports various time units (like days, hours, minutes, seconds) and is timezone-aware, returning a timestamp of the same type as the input timestamp.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

Describe the potential solution

Spark Specification

Syntax:

TIMESTAMPADD(unit, quantity, timestamp)

Arguments:

Argument Type Description
unit String The time unit to add (e.g., "YEAR", "MONTH", "DAY", "HOUR", "MINUTE", "SECOND")
quantity Long The number of units to add to the timestamp
timestamp AnyTimestampType The base timestamp to which the quantity will be added
timeZoneId Option[String] Optional timezone identifier for timezone-aware calculations

Return Type: Returns the same data type as the input timestamp parameter (preserves whether it's TimestampType or TimestampNTZType).

Supported Data Types:

  • quantity: LongType only
  • timestamp: Any timestamp type (TimestampType or TimestampNTZType)
  • unit: String literal representing valid time units

Edge Cases:

  • Null handling: Returns null if any input parameter (quantity or timestamp) is null (nullIntolerant = true)
  • Timezone handling: Automatically selects appropriate timezone based on timestamp data type
  • Unit validation: Invalid unit strings are handled during expression conversion phase
  • Overflow: Large quantity values may cause timestamp overflow, behavior depends on underlying DateTimeUtils implementation

Examples:

-- Add 5 days to a timestamp
SELECT TIMESTAMPADD('DAY', 5, TIMESTAMP '2010-01-01 01:02:03.123456');

-- Add 3 hours to current timestamp
SELECT TIMESTAMPADD('HOUR', 3, current_timestamp());

-- Subtract time by using negative quantity
SELECT TIMESTAMPADD('MINUTE', -30, TIMESTAMP '2010-01-01 12:00:00');
// DataFrame API usage
import org.apache.spark.sql.functions._

// Add 7 days to a timestamp column
df.select(expr("TIMESTAMPADD('DAY', 7, timestamp_col)"))

// Using column references for quantity
df.select(expr("TIMESTAMPADD('HOUR', quantity_col, timestamp_col)"))

Implementation Approach

See the Comet guide on adding new expressions for detailed instructions.

  1. Scala Serde: Add expression handler in spark/src/main/scala/org/apache/comet/serde/
  2. Register: Add to appropriate map in QueryPlanSerde.scala
  3. Protobuf: Add message type in native/proto/src/proto/expr.proto if needed
  4. Rust: Implement in native/spark-expr/src/ (check if DataFusion has built-in support first)

Additional context

Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.TimestampAdd

Related:

  • DateAdd - For date-only arithmetic
  • DateSub - For subtracting from dates
  • Interval expressions for duration-based calculations
  • TimeZoneAwareExpression trait for timezone handling

This issue was auto-generated from Spark reference documentation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions