Skip to content

Centralize DATE_BIN source scaling and binning through shared helper#22823

Open
kosiew wants to merge 14 commits into
apache:mainfrom
kosiew:scaling-overflow-02-22685
Open

Centralize DATE_BIN source scaling and binning through shared helper#22823
kosiew wants to merge 14 commits into
apache:mainfrom
kosiew:scaling-overflow-02-22685

Conversation

@kosiew

@kosiew kosiew commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

DATE_BIN still contained duplicated source-scaling logic across timestamp and TIME scalar/array code paths. The repeated checked multiplication and binning logic made the implementation harder to maintain and increased the risk of behavior drifting between source types.

This change centralizes source scaling to nanoseconds and binning through shared helpers while preserving existing overflow and error-handling semantics.

What changes are included in this PR?

  • Renamed the scaling helper parameter from x to value for clarity.

  • Added a new helper, checked_scale_and_bin_to_nanos_or_null, which:

    • Performs checked scaling to nanoseconds.
    • Applies the binning function.
    • Converts scaling or binning failures into None for source-value processing paths.
  • Updated timestamp source handling (scalar and array paths) to use the shared helper instead of inlined scaling and binning logic.

  • Updated TIME source handling (scalar and array paths) for:

    • Time32Millisecond
    • Time32Second
    • Time64Microsecond

    to use the shared helper instead of direct checked multiplication.

  • Kept Time64Nanosecond on a separate path since no scaling is required, and switched the array implementation to use try_unary so binning errors are propagated consistently through Arrow error handling.

  • Removed duplicated scaling code throughout DATE_BIN source conversion paths and centralized nanosecond scaling behavior.

Are these changes tested?

No new tests are included in this PR.

This refactor is intended to preserve existing behavior while consolidating implementation details. Existing DATE_BIN tests should continue to validate overflow and NULL/error handling behavior, including the targeted overflow scenarios described in the issue summary.

Are there any user-facing changes?

No.

This is an internal refactor intended to centralize DATE_BIN source scaling logic without changing SQL-visible behavior.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed.

kosiew added 4 commits June 8, 2026 19:40
…e64MicrosecondArray

- Introduced scalar Time64Microsecond(i64::MAX) overflow reproducer.
- Introduced array Time64MicrosecondArray(i64::MAX) overflow reproducer.
- Updated tests to catch current panic using catch_unwind.
…timize timestamp handling

- Added `value_to_nanos(value, scale)` function.
- Refactored to eliminate repeated `timestamp scalar checked_mul` blocks.
- Implemented helper for timestamp array scaling.
- Left TIME direct multiplies for SUB_ISSUE_03.
- Implemented value_to_nanos for TIME origin scaling.
- Updated TIME scalar source scaling to use value_to_nanos.
- Modified TIME array source scaling to include value_to_nanos and ArrowError::ComputeError mapping.
- Revised overflow repro tests to ensure no panic occurs and handle normal errors appropriately.
…ate_bin

- Renamed the overflow helper from `timestamp_scale_overflow_error` to `nanos_scale_overflow_error`.
- Updated error message to be more generic: "DATE_BIN value ... cannot be represented in nanoseconds".
- Added a new test helper: `invoke_time64_microsecond_date_bin(...)`.
- Simplified scalar and array overflow tests by using the new helper.
@github-actions github-actions Bot added the functions Changes to functions implementation label Jun 8, 2026
@kosiew kosiew marked this pull request as ready for review June 8, 2026 11:49
kosiew added 4 commits June 8, 2026 22:03
…dling

- Restored timestamp overflow message for DATE_BIN source timestamp.
- Retained generic TIME/value overflow message for DATE_BIN value.
- Updated value_to_nanos to take an error constructor.
- Revised timestamp paths to utilize timestamp_scale_overflow_error.
- Updated TIME paths to use nanos_scale_overflow_error.
- Removed timestamp_scale_overflow_error from date_bin.rs
- Updated value_to_nanos function to only use nanos_scale_overflow_error for scaling
- Modified expected DATE_BIN value in date_bin_errors.slt
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Jun 8, 2026
@kosiew kosiew marked this pull request as draft June 9, 2026 00:09
kosiew added 5 commits June 17, 2026 19:46
…anos for safety

This commit updates the date_bin_impl function to use `checked_scale_to_nanos` instead of direct integer multiplication for scaling timestamps. This change ensures safer handling of potential overflow scenarios while maintaining the intended functionality for date and time operations.
…g functions

- Renamed helper argument `x` to `value` for clarity
- Added `checked_scale_and_bin_to_nanos` function
- Unified source scaling and binning paths to share the helper flow
- Ensured no direct calls to `checked_mul(scale)` remain outside the helper
…or behavior

- Reverted Time64(Nanosecond) array branch to use try_unary(...)
- Maintained prior error behavior
- Kept out-of-scope branch unchanged semantically
…to_nanos_or_null for clarity on NULL behavior
@github-actions github-actions Bot removed the sqllogictest SQL Logic Tests (.slt) label Jun 17, 2026
@kosiew kosiew changed the title Fix DATE_BIN TIME scaling overflows by centralizing nanosecond conversion Centralize DATE_BIN source scaling and binning through shared helper Jun 17, 2026
@kosiew kosiew marked this pull request as ready for review June 17, 2026 15:35
- Reformatted match statement in `checked_scale_to_nanos` for clarity.
- Split long function calls across multiple lines in `date_bin_impl` for better readability.
- Enhanced indentation and formatting consistency in error handling and mapping functions.
Comment on lines +456 to +457
checked_scale_to_nanos(value, scale)
.ok()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont really see a value add here considering we ignore the error from checked_scale_to_nanos since we convert it to an option, not to mention this new wrapper function takes 5 arguments 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants