Skip to content

[Feature] Support Spark expression: timestamp_add_ym_interval #3115

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.

Comet does not currently support the Spark timestamp_add_ym_interval function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

The TimestampAddYMInterval expression adds a year-month interval to a timestamp value. This operation is timezone-aware and handles both TimestampType and TimestampNTZType inputs while preserving the original timestamp data type.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

Describe the potential solution

Spark Specification

Syntax:

timestamp_column + INTERVAL 'value' YEAR TO MONTH
// DataFrame API usage
col("timestamp_column") + expr("INTERVAL '2-3' YEAR TO MONTH")

Arguments:

Argument Type Description
timestamp Expression The timestamp expression to add the interval to
interval Expression The year-month interval expression to add
timeZoneId Option[String] Optional timezone identifier for timezone-aware operations

Return Type: Returns the same data type as the input timestamp expression (TimestampType or TimestampNTZType).

Supported Data Types:

  • Input timestamp: AnyTimestampType (TimestampType or TimestampNTZType)
  • Input interval: YearMonthIntervalType

Edge Cases:

  • Null handling: Returns null if either timestamp or interval input is null (null intolerant)
  • Timezone handling: Uses session timezone for TimestampType and UTC for TimestampNTZType
  • Month overflow: Handles month arithmetic that crosses year boundaries correctly
  • Day adjustment: May adjust day values when adding months to dates like January 31st + 1 month

Examples:

-- Add 2 years and 3 months to a timestamp
SELECT timestamp_col + INTERVAL '2-3' YEAR TO MONTH FROM events;

-- Add 1 year to current timestamp
SELECT current_timestamp() + INTERVAL '1' YEAR;
// DataFrame API usage
import org.apache.spark.sql.functions._

df.select(col("created_at") + expr("INTERVAL '1-6' YEAR TO MONTH"))

// Using interval column
df.select(col("timestamp_col") + col("interval_col"))

Implementation Approach

See the Comet guide on adding new expressions for detailed instructions.

  1. Scala Serde: Add expression handler in spark/src/main/scala/org/apache/comet/serde/
  2. Register: Add to appropriate map in QueryPlanSerde.scala
  3. Protobuf: Add message type in native/proto/src/proto/expr.proto if needed
  4. Rust: Implement in native/spark-expr/src/ (check if DataFusion has built-in support first)

Additional context

Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.TimestampAddYMInterval

Related:

  • DateAddYMInterval - Adds year-month intervals to date values
  • TimestampAddDTInterval - Adds day-time intervals to timestamps
  • DateTimeUtils.timestampAddMonths() - Underlying implementation method

This issue was auto-generated from Spark reference documentation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions