fix: handle `date_bin` negative subsecond and overflow cases by kumarUjjawal · Pull Request #22610 · apache/datafusion

kumarUjjawal · 2026-05-29T06:43:40Z

Which issue does this PR close?

Closes date_bin scalar & array paths have different error behaviours #22528

Rationale for this change

date_bin had a few edge cases that could return the wrong result, return an error only on array inputs, or panic/wrap when scaling timestamp and time values to nanoseconds.

What changes are included in this PR?

Fix negative sub-second timestamp conversion before the epoch.
Make scalar and array paths return NULL consistently for per-row binning errors.
Use checked scaling when converting timestamp and time values to nanoseconds.
Return an error for invalid shared origin values that overflow during scaling.
Simplify duplicated stride and scale handling.

Are these changes tested?

Yes

Are there any user-facing changes?

No public API changes.

kosiew · 2026-06-08T06:58:24Z

@kumarUjjawal
#22315 has been merged.

kumarUjjawal · 2026-06-08T07:12:42Z

Thank you @kosiew I will resolve the issues now.

kosiew

@kumarUjjawal
Thanks for the update. I did not find any blocking issues. I left a couple of small suggestions that could help maintainability and future-proofing.

kosiew · 2026-06-08T10:56:48Z

-    }
-
    Ok(match array {
        ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(v, tz_opt)) => {


Nice cleanup overall. One thought: the four scalar timestamp arms now have the same control flow and only differ by the ScalarValue variant and Arrow timestamp type. It might be worth introducing a small typed helper or macro here so the scalar path mirrors transform_array_with_stride. That would make it easier to keep the scalar and array paths in sync and reduce the chance of them drifting apart in a future change.

kosiew · 2026-06-08T10:56:48Z

+            TimestampSecondArray,
+        };
+
+        let return_field = &Arc::new(Field::new(


Small test suggestion: test_date_bin_scale_overflow_returns_null currently creates a single return_field with Timestamp(Second) while also exercising millisecond and microsecond source cases. The test passes today because the implementation does not use return_field, but it might be a little more robust to build the field from each case's expected_type. That way the test stays accurate if invoke_with_args ever starts validating or using the return field.

kumarUjjawal · 2026-06-08T12:08:01Z

Thanks @kosiew I have addressed the comments

Jefffrey

so do we only explicitly error on the origin check now? all other errors are converted to nulls?

Jefffrey · 2026-06-09T02:57:44Z

-    let nsec = (nanos % NANOS_PER_SEC) as u32;
+    // DateTime::from_timestamp requires a non-negative nanosecond part.
+    let secs = nanos.div_euclid(NANOS_PER_SEC);
+    let nsec = nanos.rem_euclid(NANOS_PER_SEC) as u32;


whats the significance of changing to div_euclid/rem_euclid here?

This fixes negative sub second timestamps. Normal / and % can produce a negative nanosecond remainder, but DateTime::from_timestamp requires nanos to be non-negative.div_euclid / rem_euclid normalize the timestamp into valid seconds + nanos.

i think only rem_euclid does this? div_euclid just seems to decide which way to round up/down

kumarUjjawal · 2026-06-09T07:46:57Z

so do we only explicitly error on the origin check now? all other errors are converted to nulls?

row-level data problems -> NULL while query-level invalid inputs -> error.

Jefffrey

row-level data problems -> NULL while query-level invalid inputs -> error.

this is a succinct way of putting it, thanks 👍

Jefffrey · 2026-06-10T06:20:09Z

+// Scale to nanoseconds and report overflow as a normal error.
+fn checked_scale_to_nanos(x: i64, scale: i64) -> Result<i64> {
+    x.checked_mul(scale).ok_or_else(|| {
+        ArrowError::InvalidArgumentError(format!(


doesnt need to be an arrowerror anymore, can just be datafusion error

alamb · 2026-06-11T18:35:50Z

run benchmark date_bin

adriangbot · 2026-06-11T18:39:17Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4683778323-541-lrs59 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fix/date-bin-error-change (38c4c6a) to c83a981 (merge-base) diff using: date_bin
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-11T18:44:24Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group            HEAD                                   fix_date-bin-error-change
-----            ----                                   -------------------------
date_bin_1000    1.37      9.4±0.07µs        ? ?/sec    1.00      6.9±0.10µs        ? ?/sec

Resource Usage

date_bin — base (merge-base)

Metric	Value
Wall time	150.0s
Peak memory	5.8 MiB
Avg memory	454.2 KiB
CPU user	9.7s
CPU sys	0.0s
Peak spill	0 B

date_bin — branch

Metric	Value
Wall time	145.0s
Peak memory	21.6 MiB
Avg memory	506.8 KiB
CPU user	10.6s
CPU sys	0.0s
Peak spill	0 B

File an issue against this benchmark runner

Jefffrey · 2026-06-13T01:36:44Z

thanks all

github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 29, 2026

kumarUjjawal requested a review from Jefffrey May 29, 2026 06:49

Jefffrey mentioned this pull request May 30, 2026

fix date_bin overflows scaling extreme Timestamp(Second) source #22315

Merged

kumarUjjawal added 2 commits June 8, 2026 15:31

fix: handle date_bin negative subsecond and overflow cases

46ee0bf

simplify tests

a20b061

kumarUjjawal force-pushed the fix/date-bin-error-change branch from 77ca1d3 to a20b061 Compare June 8, 2026 10:36

kosiew approved these changes Jun 8, 2026

View reviewed changes

chore: address date_bin review comments

72e327d

Jefffrey reviewed Jun 9, 2026

View reviewed changes

chore: simplify date_bin source overflow handling

e3472ae

Jefffrey approved these changes Jun 10, 2026

View reviewed changes

chore: address date_bin error feedback

38c4c6a

alamb changed the title ~~fix: handle date_bin negative subsecond and overflow cases~~ fix: handle date_bin negative subsecond and overflow cases Jun 11, 2026

Jefffrey added this pull request to the merge queue Jun 13, 2026

Merged via the queue into apache:main with commit f931728 Jun 13, 2026
36 checks passed

kosiew mentioned this pull request Jun 17, 2026

Centralize date_bin per-row mapping for scalar and array inputs #22987

Open

Conversation

kumarUjjawal commented May 29, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

kosiew commented Jun 8, 2026

Uh oh!

kumarUjjawal commented Jun 8, 2026

Uh oh!

kosiew left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kosiew Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

kosiew Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

kumarUjjawal commented Jun 8, 2026

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Jefffrey Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

kumarUjjawal Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kumarUjjawal commented Jun 9, 2026

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Jefffrey Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

alamb commented Jun 11, 2026

Uh oh!

adriangbot commented Jun 11, 2026

Uh oh!

adriangbot commented Jun 11, 2026

Uh oh!

Jefffrey commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kosiew left a comment •

edited

Loading