Fix partitioned backfill widening a sub-day window to the whole day#68718
Merged
Conversation
a2bef10 to
0d94a54
Compare
phanikumv
reviewed
Jun 22, 2026
fd84879 to
9974f83
Compare
phanikumv
approved these changes
Jun 23, 2026
phanikumv
left a comment
Contributor
There was a problem hiding this comment.
Fix looks good, thanks for addressing my review.
Member
Author
|
Thanks for your suggestion! I'll keep it open for one or two more hours in case someone also want to take a look |
uranusjr
reviewed
Jun 23, 2026
Backfilling a partitioned Dag for a window inside a single day (e.g. an hourly timetable for 08:00–09:00) created one run per cron tick of the entire day. iter_partition_dagrun_infos now honours the datetime window directly — both ends inclusive, at the timetable's own partition cadence — instead of rounding to whole calendar days; callers pass tz-aware datetimes.
…ndary Backfill bounds arrive attached to the core default_timezone (UTC), never the timetable timezone, so a non-UTC daily timetable lost its first partition and a sub-day window risked widening. The iteration now localizes each bound's wall-clock reading into the timetable timezone before aligning, keeping sub-day precision intact.
9974f83 to
9d26fad
Compare
Contributor
Backport successfully created: v3-3-testNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
|
github-actions Bot
pushed a commit
to aws-mwaa/upstream-to-airflow
that referenced
this pull request
Jun 23, 2026
… whole day (apache#68718) (cherry picked from commit 15b9666) Co-authored-by: Wei Lee <weilee.rx@gmail.com>
Lee-W
added a commit
to astronomer/airflow
that referenced
this pull request
Jun 23, 2026
`dags clear` and `partitions clear` passed user-supplied datetimes through `resolve_day_bound(.date())`, which stripped the time component and expanded any sub-day bound to local midnight. On an hourly partitioned Dag, `--partition-date-start 08:00 --partition-date-end 08:00` cleared all 24 partitions instead of just the 08:00 one. Adds `localize_partition_datetime` to the `Timetable` protocol (base: UTC pass-through; CronMixin: wall-clock re-interpreted in the timetable's local timezone, same logic as apache#68718). Removes the now-redundant private `_localize_wall_clock_to_timetable_timezone` from `CronPartitionTimetable`. Updates `apply_partition_date_window` to use the new method with an inclusive `<=` end bound instead of the old half-open `< next_midnight` form.
Lee-W
added a commit
to astronomer/airflow
that referenced
this pull request
Jun 23, 2026
`dags clear` and `partitions clear` passed user-supplied datetimes through `resolve_day_bound(.date())`, which stripped the time component and expanded any sub-day bound to local midnight. On an hourly partitioned Dag, `--partition-date-start 08:00 --partition-date-end 08:00` cleared all 24 partitions instead of just the 08:00 one. Adds `localize_partition_datetime` to the `Timetable` protocol (base: UTC pass-through; CronMixin: wall-clock re-interpreted in the timetable's local timezone, same logic as apache#68718). Removes the now-redundant private `_localize_wall_clock_to_timetable_timezone` from `CronPartitionTimetable`. Updates `apply_partition_date_window` to use the new method with an inclusive `<=` end bound instead of the old half-open `< next_midnight` form.
Lee-W
added a commit
to astronomer/airflow
that referenced
this pull request
Jun 23, 2026
`dags clear` and `partitions clear` passed user-supplied datetimes through `resolve_day_bound(.date())`, which stripped the time component and expanded any sub-day bound to local midnight. On an hourly partitioned Dag, `--partition-date-start 08:00 --partition-date-end 08:00` cleared all 24 partitions instead of just the 08:00 one. Adds `localize_partition_datetime` to the `Timetable` protocol (base: UTC pass-through; CronMixin: wall-clock re-interpreted in the timetable's local timezone, same logic as apache#68718). Removes the now-redundant private `_localize_wall_clock_to_timetable_timezone` from `CronPartitionTimetable`. Updates `apply_partition_date_window` to use the new method with an inclusive `<=` end bound instead of the old half-open `< next_midnight` form.
Lee-W
added a commit
to astronomer/airflow
that referenced
this pull request
Jun 23, 2026
`dags clear` and `partitions clear` passed user-supplied datetimes through `resolve_day_bound(.date())`, which stripped the time component and expanded any sub-day bound to local midnight. On an hourly partitioned Dag, `--partition-date-start 08:00 --partition-date-end 08:00` cleared all 24 partitions instead of just the 08:00 one. Adds `localize_partition_datetime` to the `Timetable` protocol (base: UTC pass-through; CronMixin: wall-clock re-interpreted in the timetable's local timezone, same logic as apache#68718). Removes the now-redundant private `_localize_wall_clock_to_timetable_timezone` from `CronPartitionTimetable`. Updates `apply_partition_date_window` to use the new method with an inclusive `<=` end bound instead of the old half-open `< next_midnight` form.
vatsrahul1001
added a commit
that referenced
this pull request
Jun 23, 2026
* API: Add partition clear support to REST API to match the CLI
clearDagRuns now accepts partition_key / partition_date window selectors
as an alternative to an explicit run list. Add POST /dags/{dag_id}/clearPartitions
to reset partition_key/partition_date on matching runs, with optional
task-instance clear — REST parity with `airflow dags clear` / `airflow partitions clear`.
* API: Deduplicate partition selector fields across clear request bodies
Extract the shared partition_key / partition_date window fields and their date-order check into a PartitionSelectorMixin reused by BulkDAGRunClearBody and ClearPartitionsBody, and replace the repeated partition-selector presence checks with a has_partition_selectors property. No behavior change.
* Scope partition-clear task instance queries to the target dag
Add a dag_id filter to the task-instance lookups in both the REST clear_partition_fields service and the airflow partitions clear CLI so a run_id shared across dags no longer clears another dag's task instances, and collapse the per-run dry-run task-instance lookups into a single batched count query.
* Share the partition date-window filter across clear paths
Extract the resolve_day_bound partition_date window resolution duplicated across the REST clear_dag_runs route, the clear_partition_fields service, and the airflow partitions clear CLI into a single DagRun.apply_partition_date_window helper so the three cannot drift.
* Share the partition-clear core between the REST API and the CLI
Extract the partition column-reset, task-instance batching, and dry-run counting into a single DagRun.clear_partition_runs helper reused by the clearPartitions REST endpoint and the airflow partitions clear CLI, replacing the two parallel implementations. The CLI keeps its per-run output through an optional callback. No behavior change.
* Fix partition clear commands widening sub-day windows to the whole day
`dags clear` and `partitions clear` passed user-supplied datetimes
through `resolve_day_bound(.date())`, which stripped the time component
and expanded any sub-day bound to local midnight. On an hourly
partitioned Dag, `--partition-date-start 08:00 --partition-date-end
08:00` cleared all 24 partitions instead of just the 08:00 one.
Adds `localize_partition_datetime` to the `Timetable` protocol (base:
UTC pass-through; CronMixin: wall-clock re-interpreted in the
timetable's local timezone, same logic as #68718). Removes the
now-redundant private `_localize_wall_clock_to_timetable_timezone` from
`CronPartitionTimetable`. Updates `apply_partition_date_window` to use
the new method with an inclusive `<=` end bound instead of the old
half-open `< next_midnight` form.
* Update REST datamodel descriptions to reflect sub-day precision
* Share partition selection-mode validation across clear request bodies
BulkDAGRunClearBody and ClearPartitionsBody duplicated the same
"exactly one selection mode" rule, including the partition-window
definition and the selector-enumeration error message, which would
drift independently. Move the shared check onto PartitionSelectorMixin
so the partition-selector semantics live in one place.
* Drop sub-day-precision wording from partition clear CLI help
The "sub-day precision is preserved" phrasing framed the help against a
since-fixed truncation bug, which is meaningless to a reader seeing the
text fresh. The timezone re-interpretation note plus the date-only ->
midnight rule already convey that the time of day is honoured.
* Remove unused resolve_day_bound
* Refactor tests
* regen docs
* Fix test failure
* Fix ruff F402 and docs spelling failing CI on partition-clear branch
A loop variable shadowed the imported `task` decorator (ruff F402) and a
British-spelled word in a new docstring tripped the en_US docs spell-check.
---------
Co-authored-by: Rahul Vats <rah.sharma11@gmail.com>
github-actions Bot
pushed a commit
to aws-mwaa/upstream-to-airflow
that referenced
this pull request
Jun 23, 2026
…pache#68702) * API: Add partition clear support to REST API to match the CLI clearDagRuns now accepts partition_key / partition_date window selectors as an alternative to an explicit run list. Add POST /dags/{dag_id}/clearPartitions to reset partition_key/partition_date on matching runs, with optional task-instance clear — REST parity with `airflow dags clear` / `airflow partitions clear`. * API: Deduplicate partition selector fields across clear request bodies Extract the shared partition_key / partition_date window fields and their date-order check into a PartitionSelectorMixin reused by BulkDAGRunClearBody and ClearPartitionsBody, and replace the repeated partition-selector presence checks with a has_partition_selectors property. No behavior change. * Scope partition-clear task instance queries to the target dag Add a dag_id filter to the task-instance lookups in both the REST clear_partition_fields service and the airflow partitions clear CLI so a run_id shared across dags no longer clears another dag's task instances, and collapse the per-run dry-run task-instance lookups into a single batched count query. * Share the partition date-window filter across clear paths Extract the resolve_day_bound partition_date window resolution duplicated across the REST clear_dag_runs route, the clear_partition_fields service, and the airflow partitions clear CLI into a single DagRun.apply_partition_date_window helper so the three cannot drift. * Share the partition-clear core between the REST API and the CLI Extract the partition column-reset, task-instance batching, and dry-run counting into a single DagRun.clear_partition_runs helper reused by the clearPartitions REST endpoint and the airflow partitions clear CLI, replacing the two parallel implementations. The CLI keeps its per-run output through an optional callback. No behavior change. * Fix partition clear commands widening sub-day windows to the whole day `dags clear` and `partitions clear` passed user-supplied datetimes through `resolve_day_bound(.date())`, which stripped the time component and expanded any sub-day bound to local midnight. On an hourly partitioned Dag, `--partition-date-start 08:00 --partition-date-end 08:00` cleared all 24 partitions instead of just the 08:00 one. Adds `localize_partition_datetime` to the `Timetable` protocol (base: UTC pass-through; CronMixin: wall-clock re-interpreted in the timetable's local timezone, same logic as apache#68718). Removes the now-redundant private `_localize_wall_clock_to_timetable_timezone` from `CronPartitionTimetable`. Updates `apply_partition_date_window` to use the new method with an inclusive `<=` end bound instead of the old half-open `< next_midnight` form. * Update REST datamodel descriptions to reflect sub-day precision * Share partition selection-mode validation across clear request bodies BulkDAGRunClearBody and ClearPartitionsBody duplicated the same "exactly one selection mode" rule, including the partition-window definition and the selector-enumeration error message, which would drift independently. Move the shared check onto PartitionSelectorMixin so the partition-selector semantics live in one place. * Drop sub-day-precision wording from partition clear CLI help The "sub-day precision is preserved" phrasing framed the help against a since-fixed truncation bug, which is meaningless to a reader seeing the text fresh. The timezone re-interpretation note plus the date-only -> midnight rule already convey that the time of day is honoured. * Remove unused resolve_day_bound * Refactor tests * regen docs * Fix test failure * Fix ruff F402 and docs spelling failing CI on partition-clear branch A loop variable shadowed the imported `task` decorator (ruff F402) and a British-spelled word in a new docstring tripped the en_US docs spell-check. --------- (cherry picked from commit a0805a8) Co-authored-by: Wei Lee <weilee.rx@gmail.com> Co-authored-by: Rahul Vats <rah.sharma11@gmail.com>
aws-airflow-bot
pushed a commit
to aws-mwaa/upstream-to-airflow
that referenced
this pull request
Jun 23, 2026
…pache#68702) * API: Add partition clear support to REST API to match the CLI clearDagRuns now accepts partition_key / partition_date window selectors as an alternative to an explicit run list. Add POST /dags/{dag_id}/clearPartitions to reset partition_key/partition_date on matching runs, with optional task-instance clear — REST parity with `airflow dags clear` / `airflow partitions clear`. * API: Deduplicate partition selector fields across clear request bodies Extract the shared partition_key / partition_date window fields and their date-order check into a PartitionSelectorMixin reused by BulkDAGRunClearBody and ClearPartitionsBody, and replace the repeated partition-selector presence checks with a has_partition_selectors property. No behavior change. * Scope partition-clear task instance queries to the target dag Add a dag_id filter to the task-instance lookups in both the REST clear_partition_fields service and the airflow partitions clear CLI so a run_id shared across dags no longer clears another dag's task instances, and collapse the per-run dry-run task-instance lookups into a single batched count query. * Share the partition date-window filter across clear paths Extract the resolve_day_bound partition_date window resolution duplicated across the REST clear_dag_runs route, the clear_partition_fields service, and the airflow partitions clear CLI into a single DagRun.apply_partition_date_window helper so the three cannot drift. * Share the partition-clear core between the REST API and the CLI Extract the partition column-reset, task-instance batching, and dry-run counting into a single DagRun.clear_partition_runs helper reused by the clearPartitions REST endpoint and the airflow partitions clear CLI, replacing the two parallel implementations. The CLI keeps its per-run output through an optional callback. No behavior change. * Fix partition clear commands widening sub-day windows to the whole day `dags clear` and `partitions clear` passed user-supplied datetimes through `resolve_day_bound(.date())`, which stripped the time component and expanded any sub-day bound to local midnight. On an hourly partitioned Dag, `--partition-date-start 08:00 --partition-date-end 08:00` cleared all 24 partitions instead of just the 08:00 one. Adds `localize_partition_datetime` to the `Timetable` protocol (base: UTC pass-through; CronMixin: wall-clock re-interpreted in the timetable's local timezone, same logic as apache#68718). Removes the now-redundant private `_localize_wall_clock_to_timetable_timezone` from `CronPartitionTimetable`. Updates `apply_partition_date_window` to use the new method with an inclusive `<=` end bound instead of the old half-open `< next_midnight` form. * Update REST datamodel descriptions to reflect sub-day precision * Share partition selection-mode validation across clear request bodies BulkDAGRunClearBody and ClearPartitionsBody duplicated the same "exactly one selection mode" rule, including the partition-window definition and the selector-enumeration error message, which would drift independently. Move the shared check onto PartitionSelectorMixin so the partition-selector semantics live in one place. * Drop sub-day-precision wording from partition clear CLI help The "sub-day precision is preserved" phrasing framed the help against a since-fixed truncation bug, which is meaningless to a reader seeing the text fresh. The timezone re-interpretation note plus the date-only -> midnight rule already convey that the time of day is honoured. * Remove unused resolve_day_bound * Refactor tests * regen docs * Fix test failure * Fix ruff F402 and docs spelling failing CI on partition-clear branch A loop variable shadowed the imported `task` decorator (ruff F402) and a British-spelled word in a new docstring tripped the en_US docs spell-check. --------- (cherry picked from commit a0805a8) Co-authored-by: Wei Lee <weilee.rx@gmail.com> Co-authored-by: Rahul Vats <rah.sharma11@gmail.com>
cetingokhan
pushed a commit
to cetingokhan/airflow
that referenced
this pull request
Jun 24, 2026
cetingokhan
pushed a commit
to cetingokhan/airflow
that referenced
this pull request
Jun 24, 2026
* API: Add partition clear support to REST API to match the CLI
clearDagRuns now accepts partition_key / partition_date window selectors
as an alternative to an explicit run list. Add POST /dags/{dag_id}/clearPartitions
to reset partition_key/partition_date on matching runs, with optional
task-instance clear — REST parity with `airflow dags clear` / `airflow partitions clear`.
* API: Deduplicate partition selector fields across clear request bodies
Extract the shared partition_key / partition_date window fields and their date-order check into a PartitionSelectorMixin reused by BulkDAGRunClearBody and ClearPartitionsBody, and replace the repeated partition-selector presence checks with a has_partition_selectors property. No behavior change.
* Scope partition-clear task instance queries to the target dag
Add a dag_id filter to the task-instance lookups in both the REST clear_partition_fields service and the airflow partitions clear CLI so a run_id shared across dags no longer clears another dag's task instances, and collapse the per-run dry-run task-instance lookups into a single batched count query.
* Share the partition date-window filter across clear paths
Extract the resolve_day_bound partition_date window resolution duplicated across the REST clear_dag_runs route, the clear_partition_fields service, and the airflow partitions clear CLI into a single DagRun.apply_partition_date_window helper so the three cannot drift.
* Share the partition-clear core between the REST API and the CLI
Extract the partition column-reset, task-instance batching, and dry-run counting into a single DagRun.clear_partition_runs helper reused by the clearPartitions REST endpoint and the airflow partitions clear CLI, replacing the two parallel implementations. The CLI keeps its per-run output through an optional callback. No behavior change.
* Fix partition clear commands widening sub-day windows to the whole day
`dags clear` and `partitions clear` passed user-supplied datetimes
through `resolve_day_bound(.date())`, which stripped the time component
and expanded any sub-day bound to local midnight. On an hourly
partitioned Dag, `--partition-date-start 08:00 --partition-date-end
08:00` cleared all 24 partitions instead of just the 08:00 one.
Adds `localize_partition_datetime` to the `Timetable` protocol (base:
UTC pass-through; CronMixin: wall-clock re-interpreted in the
timetable's local timezone, same logic as apache#68718). Removes the
now-redundant private `_localize_wall_clock_to_timetable_timezone` from
`CronPartitionTimetable`. Updates `apply_partition_date_window` to use
the new method with an inclusive `<=` end bound instead of the old
half-open `< next_midnight` form.
* Update REST datamodel descriptions to reflect sub-day precision
* Share partition selection-mode validation across clear request bodies
BulkDAGRunClearBody and ClearPartitionsBody duplicated the same
"exactly one selection mode" rule, including the partition-window
definition and the selector-enumeration error message, which would
drift independently. Move the shared check onto PartitionSelectorMixin
so the partition-selector semantics live in one place.
* Drop sub-day-precision wording from partition clear CLI help
The "sub-day precision is preserved" phrasing framed the help against a
since-fixed truncation bug, which is meaningless to a reader seeing the
text fresh. The timezone re-interpretation note plus the date-only ->
midnight rule already convey that the time of day is honoured.
* Remove unused resolve_day_bound
* Refactor tests
* regen docs
* Fix test failure
* Fix ruff F402 and docs spelling failing CI on partition-clear branch
A loop variable shadowed the imported `task` decorator (ruff F402) and a
British-spelled word in a new docstring tripped the en_US docs spell-check.
---------
Co-authored-by: Rahul Vats <rah.sharma11@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Fix the buggy partition-backfill-by-range path introduced in #67537
CronPartitionTimetable("0 * * * *")) backfilled over a window inside a single day produced one Dag run per cron tick of the entire day. A user backfilling the hourly Dagingest_team_a_player_statsfor 08:00–09:00 got ~24 runs instead of the requested hour.SerializedDAG.iter_dagrun_infos_betweentruncated its bounds to calendar dates (earliest.date()), thenCronPartitionTimetable.iter_partition_dagrun_infossnapped those to whole local days viaresolve_day_bound— discarding the time-of-day.What
iter_partition_dagrun_infoscontract changed fromearliest_date/latest_date: datetime.datetoearliest/latest: datetime.datetime.iter_dagrun_infos_betweendispatch passes the full datetimes (no more.date()truncation).CronPartitionTimetable.iter_partition_dagrun_infoswalks_align_to_next(earliest)whilecurrent <= latestat the cron cadence, instead of expanding to whole days.iter_dagrun_infos_betweenpath. Callers pass tz-aware datetimes (the backfill API and CLI already storefrom_date/to_dateasdatetime). The CLI clear-by-date commands usingresolve_day_boundare unchanged (separate concern).Was generative AI tooling used to co-author this PR?
Generated-by: [Claude] following the guidelines
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.