Skip to content

fix: handle date_bin negative subsecond and overflow cases#22610

Merged
Jefffrey merged 5 commits into
apache:mainfrom
kumarUjjawal:fix/date-bin-error-change
Jun 13, 2026
Merged

fix: handle date_bin negative subsecond and overflow cases#22610
Jefffrey merged 5 commits into
apache:mainfrom
kumarUjjawal:fix/date-bin-error-change

Conversation

@kumarUjjawal

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

date_bin had a few edge cases that could return the wrong result, return an error only on array inputs, or panic/wrap when scaling timestamp and time values to nanoseconds.

What changes are included in this PR?

  • Fix negative sub-second timestamp conversion before the epoch.
  • Make scalar and array paths return NULL consistently for per-row binning errors.
  • Use checked scaling when converting timestamp and time values to nanoseconds.
  • Return an error for invalid shared origin values that overflow during scaling.
  • Simplify duplicated stride and scale handling.

Are these changes tested?

Yes

Are there any user-facing changes?

No public API changes.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 29, 2026
@kumarUjjawal kumarUjjawal requested a review from Jefffrey May 29, 2026 06:49
@kosiew

kosiew commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@kumarUjjawal
#22315 has been merged.

@kumarUjjawal

Copy link
Copy Markdown
Contributor Author

Thank you @kosiew I will resolve the issues now.

@kumarUjjawal kumarUjjawal force-pushed the fix/date-bin-error-change branch from 77ca1d3 to a20b061 Compare June 8, 2026 10:36

@kosiew kosiew left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kumarUjjawal
Thanks for the update. I did not find any blocking issues. I left a couple of small suggestions that could help maintainability and future-proofing.

}

Ok(match array {
ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(v, tz_opt)) => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup overall. One thought: the four scalar timestamp arms now have the same control flow and only differ by the ScalarValue variant and Arrow timestamp type. It might be worth introducing a small typed helper or macro here so the scalar path mirrors transform_array_with_stride. That would make it easier to keep the scalar and array paths in sync and reduce the chance of them drifting apart in a future change.

TimestampSecondArray,
};

let return_field = &Arc::new(Field::new(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small test suggestion: test_date_bin_scale_overflow_returns_null currently creates a single return_field with Timestamp(Second) while also exercising millisecond and microsecond source cases. The test passes today because the implementation does not use return_field, but it might be a little more robust to build the field from each case's expected_type. That way the test stays accurate if invoke_with_args ever starts validating or using the return field.

@kumarUjjawal

Copy link
Copy Markdown
Contributor Author

Thanks @kosiew I have addressed the comments

@Jefffrey Jefffrey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do we only explicitly error on the origin check now? all other errors are converted to nulls?

let nsec = (nanos % NANOS_PER_SEC) as u32;
// DateTime::from_timestamp requires a non-negative nanosecond part.
let secs = nanos.div_euclid(NANOS_PER_SEC);
let nsec = nanos.rem_euclid(NANOS_PER_SEC) as u32;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the significance of changing to div_euclid/rem_euclid here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes negative sub second timestamps. Normal / and % can produce a negative nanosecond remainder, but DateTime::from_timestamp requires nanos to be non-negative.div_euclid / rem_euclid normalize the timestamp into valid seconds + nanos.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think only rem_euclid does this? div_euclid just seems to decide which way to round up/down

Comment thread datafusion/functions/src/datetime/date_bin.rs Outdated
Comment thread datafusion/functions/src/datetime/date_bin.rs Outdated
Comment thread datafusion/functions/src/datetime/date_bin.rs Outdated
Comment thread datafusion/functions/src/datetime/date_bin.rs Outdated
@kumarUjjawal

Copy link
Copy Markdown
Contributor Author

so do we only explicitly error on the origin check now? all other errors are converted to nulls?

row-level data problems -> NULL while query-level invalid inputs -> error.

@Jefffrey Jefffrey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

row-level data problems -> NULL while query-level invalid inputs -> error.

this is a succinct way of putting it, thanks 👍

// Scale to nanoseconds and report overflow as a normal error.
fn checked_scale_to_nanos(x: i64, scale: i64) -> Result<i64> {
x.checked_mul(scale).ok_or_else(|| {
ArrowError::InvalidArgumentError(format!(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesnt need to be an arrowerror anymore, can just be datafusion error

@alamb alamb changed the title fix: handle date_bin negative subsecond and overflow cases fix: handle date_bin negative subsecond and overflow cases Jun 11, 2026
@alamb

alamb commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

run benchmark date_bin

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4683778323-541-lrs59 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fix/date-bin-error-change (38c4c6a) to c83a981 (merge-base) diff using: date_bin
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group            HEAD                                   fix_date-bin-error-change
-----            ----                                   -------------------------
date_bin_1000    1.37      9.4±0.07µs        ? ?/sec    1.00      6.9±0.10µs        ? ?/sec

Resource Usage

date_bin — base (merge-base)

Metric Value
Wall time 150.0s
Peak memory 5.8 MiB
Avg memory 454.2 KiB
CPU user 9.7s
CPU sys 0.0s
Peak spill 0 B

date_bin — branch

Metric Value
Wall time 145.0s
Peak memory 21.6 MiB
Avg memory 506.8 KiB
CPU user 10.6s
CPU sys 0.0s
Peak spill 0 B

File an issue against this benchmark runner

@Jefffrey Jefffrey added this pull request to the merge queue Jun 13, 2026
@Jefffrey

Copy link
Copy Markdown
Contributor

thanks all

Merged via the queue into apache:main with commit f931728 Jun 13, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

date_bin scalar & array paths have different error behaviours

5 participants