Skip to content

Fix partitioned backfill widening a sub-day window to the whole day#68718

Merged
Lee-W merged 2 commits into
apache:mainfrom
astronomer:partition-backfill-window
Jun 23, 2026
Merged

Fix partitioned backfill widening a sub-day window to the whole day#68718
Lee-W merged 2 commits into
apache:mainfrom
astronomer:partition-backfill-window

Conversation

@Lee-W

@Lee-W Lee-W commented Jun 18, 2026

Copy link
Copy Markdown
Member

Why

Fix the buggy partition-backfill-by-range path introduced in #67537

  • A partitioned timetable (e.g. CronPartitionTimetable("0 * * * *")) backfilled over a window inside a single day produced one Dag run per cron tick of the entire day. A user backfilling the hourly Dag ingest_team_a_player_stats for 08:00–09:00 got ~24 runs instead of the requested hour.
  • Root cause: SerializedDAG.iter_dagrun_infos_between truncated its bounds to calendar dates (earliest.date()), then CronPartitionTimetable.iter_partition_dagrun_infos snapped those to whole local days via resolve_day_bound — discarding the time-of-day.

What

  • Partition iteration now honours the actual datetime window instead of rounding to whole calendar days; the backfill range follows the timetable's own partition cadence.
  • iter_partition_dagrun_infos contract changed from earliest_date/latest_date: datetime.date to earliest/latest: datetime.datetime.
  • iter_dagrun_infos_between dispatch passes the full datetimes (no more .date() truncation).
  • CronPartitionTimetable.iter_partition_dagrun_infos walks _align_to_next(earliest) while current <= latest at the cron cadence, instead of expanding to whole days.
  • Bounds are honoured as instants, both ends inclusive — matching the non-partitioned iter_dagrun_infos_between path. Callers pass tz-aware datetimes (the backfill API and CLI already store from_date/to_date as datetime). The CLI clear-by-date commands using resolve_day_bound are unchanged (separate concern).

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: [Claude] following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@Lee-W Lee-W added the backport-to-v3-3-test Backport to v3-3-test label Jun 18, 2026
@Lee-W Lee-W force-pushed the partition-backfill-window branch from a2bef10 to 0d94a54 Compare June 18, 2026 15:35
@Lee-W Lee-W added this to the Airflow 3.3.0 milestone Jun 19, 2026
Comment thread airflow-core/src/airflow/timetables/trigger.py Outdated
@Lee-W Lee-W self-assigned this Jun 22, 2026
@Lee-W Lee-W force-pushed the partition-backfill-window branch 11 times, most recently from fd84879 to 9974f83 Compare June 22, 2026 14:29
@Lee-W Lee-W requested review from amoghrajesh and jason810496 June 22, 2026 14:34

@phanikumv phanikumv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks good, thanks for addressing my review.

@Lee-W

Lee-W commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

Thanks for your suggestion! I'll keep it open for one or two more hours in case someone also want to take a look

Comment thread airflow-core/src/airflow/timetables/trigger.py Outdated
Lee-W added 2 commits June 23, 2026 13:07
Backfilling a partitioned Dag for a window inside a single day (e.g. an hourly
timetable for 08:00–09:00) created one run per cron tick of the entire day.
iter_partition_dagrun_infos now honours the datetime window directly — both ends
inclusive, at the timetable's own partition cadence — instead of rounding to whole
calendar days; callers pass tz-aware datetimes.
…ndary

Backfill bounds arrive attached to the core default_timezone (UTC), never
the timetable timezone, so a non-UTC daily timetable lost its first
partition and a sub-day window risked widening. The iteration now localizes
each bound's wall-clock reading into the timetable timezone before aligning,
keeping sub-day precision intact.
@Lee-W Lee-W force-pushed the partition-backfill-window branch from 9974f83 to 9d26fad Compare June 23, 2026 05:10
@Lee-W Lee-W merged commit 15b9666 into apache:main Jun 23, 2026
77 checks passed
@Lee-W Lee-W deleted the partition-backfill-window branch June 23, 2026 05:54
@github-actions

Copy link
Copy Markdown
Contributor

Backport successfully created: v3-3-test

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-3-test PR Link

github-actions Bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Jun 23, 2026
… whole day (apache#68718)

(cherry picked from commit 15b9666)

Co-authored-by: Wei Lee <weilee.rx@gmail.com>
Lee-W added a commit that referenced this pull request Jun 23, 2026
… whole day (#68718) (#68881)

Co-authored-by: Wei Lee <weilee.rx@gmail.com>
Lee-W added a commit to astronomer/airflow that referenced this pull request Jun 23, 2026
`dags clear` and `partitions clear` passed user-supplied datetimes
through `resolve_day_bound(.date())`, which stripped the time component
and expanded any sub-day bound to local midnight. On an hourly
partitioned Dag, `--partition-date-start 08:00 --partition-date-end
08:00` cleared all 24 partitions instead of just the 08:00 one.

Adds `localize_partition_datetime` to the `Timetable` protocol (base:
UTC pass-through; CronMixin: wall-clock re-interpreted in the
timetable's local timezone, same logic as apache#68718). Removes the
now-redundant private `_localize_wall_clock_to_timetable_timezone` from
`CronPartitionTimetable`. Updates `apply_partition_date_window` to use
the new method with an inclusive `<=` end bound instead of the old
half-open `< next_midnight` form.
Lee-W added a commit to astronomer/airflow that referenced this pull request Jun 23, 2026
`dags clear` and `partitions clear` passed user-supplied datetimes
through `resolve_day_bound(.date())`, which stripped the time component
and expanded any sub-day bound to local midnight. On an hourly
partitioned Dag, `--partition-date-start 08:00 --partition-date-end
08:00` cleared all 24 partitions instead of just the 08:00 one.

Adds `localize_partition_datetime` to the `Timetable` protocol (base:
UTC pass-through; CronMixin: wall-clock re-interpreted in the
timetable's local timezone, same logic as apache#68718). Removes the
now-redundant private `_localize_wall_clock_to_timetable_timezone` from
`CronPartitionTimetable`. Updates `apply_partition_date_window` to use
the new method with an inclusive `<=` end bound instead of the old
half-open `< next_midnight` form.
Lee-W added a commit to astronomer/airflow that referenced this pull request Jun 23, 2026
`dags clear` and `partitions clear` passed user-supplied datetimes
through `resolve_day_bound(.date())`, which stripped the time component
and expanded any sub-day bound to local midnight. On an hourly
partitioned Dag, `--partition-date-start 08:00 --partition-date-end
08:00` cleared all 24 partitions instead of just the 08:00 one.

Adds `localize_partition_datetime` to the `Timetable` protocol (base:
UTC pass-through; CronMixin: wall-clock re-interpreted in the
timetable's local timezone, same logic as apache#68718). Removes the
now-redundant private `_localize_wall_clock_to_timetable_timezone` from
`CronPartitionTimetable`. Updates `apply_partition_date_window` to use
the new method with an inclusive `<=` end bound instead of the old
half-open `< next_midnight` form.
Lee-W added a commit to astronomer/airflow that referenced this pull request Jun 23, 2026
`dags clear` and `partitions clear` passed user-supplied datetimes
through `resolve_day_bound(.date())`, which stripped the time component
and expanded any sub-day bound to local midnight. On an hourly
partitioned Dag, `--partition-date-start 08:00 --partition-date-end
08:00` cleared all 24 partitions instead of just the 08:00 one.

Adds `localize_partition_datetime` to the `Timetable` protocol (base:
UTC pass-through; CronMixin: wall-clock re-interpreted in the
timetable's local timezone, same logic as apache#68718). Removes the
now-redundant private `_localize_wall_clock_to_timetable_timezone` from
`CronPartitionTimetable`. Updates `apply_partition_date_window` to use
the new method with an inclusive `<=` end bound instead of the old
half-open `< next_midnight` form.
vatsrahul1001 added a commit that referenced this pull request Jun 23, 2026
* API: Add partition clear support to REST API to match the CLI

clearDagRuns now accepts partition_key / partition_date window selectors
as an alternative to an explicit run list. Add POST /dags/{dag_id}/clearPartitions
to reset partition_key/partition_date on matching runs, with optional
task-instance clear — REST parity with `airflow dags clear` / `airflow partitions clear`.

* API: Deduplicate partition selector fields across clear request bodies

Extract the shared partition_key / partition_date window fields and their date-order check into a PartitionSelectorMixin reused by BulkDAGRunClearBody and ClearPartitionsBody, and replace the repeated partition-selector presence checks with a has_partition_selectors property. No behavior change.

* Scope partition-clear task instance queries to the target dag

Add a dag_id filter to the task-instance lookups in both the REST clear_partition_fields service and the airflow partitions clear CLI so a run_id shared across dags no longer clears another dag's task instances, and collapse the per-run dry-run task-instance lookups into a single batched count query.

* Share the partition date-window filter across clear paths

Extract the resolve_day_bound partition_date window resolution duplicated across the REST clear_dag_runs route, the clear_partition_fields service, and the airflow partitions clear CLI into a single DagRun.apply_partition_date_window helper so the three cannot drift.

* Share the partition-clear core between the REST API and the CLI

Extract the partition column-reset, task-instance batching, and dry-run counting into a single DagRun.clear_partition_runs helper reused by the clearPartitions REST endpoint and the airflow partitions clear CLI, replacing the two parallel implementations. The CLI keeps its per-run output through an optional callback. No behavior change.

* Fix partition clear commands widening sub-day windows to the whole day

`dags clear` and `partitions clear` passed user-supplied datetimes
through `resolve_day_bound(.date())`, which stripped the time component
and expanded any sub-day bound to local midnight. On an hourly
partitioned Dag, `--partition-date-start 08:00 --partition-date-end
08:00` cleared all 24 partitions instead of just the 08:00 one.

Adds `localize_partition_datetime` to the `Timetable` protocol (base:
UTC pass-through; CronMixin: wall-clock re-interpreted in the
timetable's local timezone, same logic as #68718). Removes the
now-redundant private `_localize_wall_clock_to_timetable_timezone` from
`CronPartitionTimetable`. Updates `apply_partition_date_window` to use
the new method with an inclusive `<=` end bound instead of the old
half-open `< next_midnight` form.

* Update REST datamodel descriptions to reflect sub-day precision

* Share partition selection-mode validation across clear request bodies

BulkDAGRunClearBody and ClearPartitionsBody duplicated the same
"exactly one selection mode" rule, including the partition-window
definition and the selector-enumeration error message, which would
drift independently. Move the shared check onto PartitionSelectorMixin
so the partition-selector semantics live in one place.

* Drop sub-day-precision wording from partition clear CLI help

The "sub-day precision is preserved" phrasing framed the help against a
since-fixed truncation bug, which is meaningless to a reader seeing the
text fresh. The timezone re-interpretation note plus the date-only ->
midnight rule already convey that the time of day is honoured.

* Remove unused resolve_day_bound

* Refactor tests

* regen docs

* Fix test failure

* Fix ruff F402 and docs spelling failing CI on partition-clear branch

A loop variable shadowed the imported `task` decorator (ruff F402) and a
British-spelled word in a new docstring tripped the en_US docs spell-check.

---------

Co-authored-by: Rahul Vats <rah.sharma11@gmail.com>
github-actions Bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Jun 23, 2026
…pache#68702)

* API: Add partition clear support to REST API to match the CLI

clearDagRuns now accepts partition_key / partition_date window selectors
as an alternative to an explicit run list. Add POST /dags/{dag_id}/clearPartitions
to reset partition_key/partition_date on matching runs, with optional
task-instance clear — REST parity with `airflow dags clear` / `airflow partitions clear`.

* API: Deduplicate partition selector fields across clear request bodies

Extract the shared partition_key / partition_date window fields and their date-order check into a PartitionSelectorMixin reused by BulkDAGRunClearBody and ClearPartitionsBody, and replace the repeated partition-selector presence checks with a has_partition_selectors property. No behavior change.

* Scope partition-clear task instance queries to the target dag

Add a dag_id filter to the task-instance lookups in both the REST clear_partition_fields service and the airflow partitions clear CLI so a run_id shared across dags no longer clears another dag's task instances, and collapse the per-run dry-run task-instance lookups into a single batched count query.

* Share the partition date-window filter across clear paths

Extract the resolve_day_bound partition_date window resolution duplicated across the REST clear_dag_runs route, the clear_partition_fields service, and the airflow partitions clear CLI into a single DagRun.apply_partition_date_window helper so the three cannot drift.

* Share the partition-clear core between the REST API and the CLI

Extract the partition column-reset, task-instance batching, and dry-run counting into a single DagRun.clear_partition_runs helper reused by the clearPartitions REST endpoint and the airflow partitions clear CLI, replacing the two parallel implementations. The CLI keeps its per-run output through an optional callback. No behavior change.

* Fix partition clear commands widening sub-day windows to the whole day

`dags clear` and `partitions clear` passed user-supplied datetimes
through `resolve_day_bound(.date())`, which stripped the time component
and expanded any sub-day bound to local midnight. On an hourly
partitioned Dag, `--partition-date-start 08:00 --partition-date-end
08:00` cleared all 24 partitions instead of just the 08:00 one.

Adds `localize_partition_datetime` to the `Timetable` protocol (base:
UTC pass-through; CronMixin: wall-clock re-interpreted in the
timetable's local timezone, same logic as apache#68718). Removes the
now-redundant private `_localize_wall_clock_to_timetable_timezone` from
`CronPartitionTimetable`. Updates `apply_partition_date_window` to use
the new method with an inclusive `<=` end bound instead of the old
half-open `< next_midnight` form.

* Update REST datamodel descriptions to reflect sub-day precision

* Share partition selection-mode validation across clear request bodies

BulkDAGRunClearBody and ClearPartitionsBody duplicated the same
"exactly one selection mode" rule, including the partition-window
definition and the selector-enumeration error message, which would
drift independently. Move the shared check onto PartitionSelectorMixin
so the partition-selector semantics live in one place.

* Drop sub-day-precision wording from partition clear CLI help

The "sub-day precision is preserved" phrasing framed the help against a
since-fixed truncation bug, which is meaningless to a reader seeing the
text fresh. The timezone re-interpretation note plus the date-only ->
midnight rule already convey that the time of day is honoured.

* Remove unused resolve_day_bound

* Refactor tests

* regen docs

* Fix test failure

* Fix ruff F402 and docs spelling failing CI on partition-clear branch

A loop variable shadowed the imported `task` decorator (ruff F402) and a
British-spelled word in a new docstring tripped the en_US docs spell-check.

---------
(cherry picked from commit a0805a8)

Co-authored-by: Wei Lee <weilee.rx@gmail.com>
Co-authored-by: Rahul Vats <rah.sharma11@gmail.com>
aws-airflow-bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Jun 23, 2026
…pache#68702)

* API: Add partition clear support to REST API to match the CLI

clearDagRuns now accepts partition_key / partition_date window selectors
as an alternative to an explicit run list. Add POST /dags/{dag_id}/clearPartitions
to reset partition_key/partition_date on matching runs, with optional
task-instance clear — REST parity with `airflow dags clear` / `airflow partitions clear`.

* API: Deduplicate partition selector fields across clear request bodies

Extract the shared partition_key / partition_date window fields and their date-order check into a PartitionSelectorMixin reused by BulkDAGRunClearBody and ClearPartitionsBody, and replace the repeated partition-selector presence checks with a has_partition_selectors property. No behavior change.

* Scope partition-clear task instance queries to the target dag

Add a dag_id filter to the task-instance lookups in both the REST clear_partition_fields service and the airflow partitions clear CLI so a run_id shared across dags no longer clears another dag's task instances, and collapse the per-run dry-run task-instance lookups into a single batched count query.

* Share the partition date-window filter across clear paths

Extract the resolve_day_bound partition_date window resolution duplicated across the REST clear_dag_runs route, the clear_partition_fields service, and the airflow partitions clear CLI into a single DagRun.apply_partition_date_window helper so the three cannot drift.

* Share the partition-clear core between the REST API and the CLI

Extract the partition column-reset, task-instance batching, and dry-run counting into a single DagRun.clear_partition_runs helper reused by the clearPartitions REST endpoint and the airflow partitions clear CLI, replacing the two parallel implementations. The CLI keeps its per-run output through an optional callback. No behavior change.

* Fix partition clear commands widening sub-day windows to the whole day

`dags clear` and `partitions clear` passed user-supplied datetimes
through `resolve_day_bound(.date())`, which stripped the time component
and expanded any sub-day bound to local midnight. On an hourly
partitioned Dag, `--partition-date-start 08:00 --partition-date-end
08:00` cleared all 24 partitions instead of just the 08:00 one.

Adds `localize_partition_datetime` to the `Timetable` protocol (base:
UTC pass-through; CronMixin: wall-clock re-interpreted in the
timetable's local timezone, same logic as apache#68718). Removes the
now-redundant private `_localize_wall_clock_to_timetable_timezone` from
`CronPartitionTimetable`. Updates `apply_partition_date_window` to use
the new method with an inclusive `<=` end bound instead of the old
half-open `< next_midnight` form.

* Update REST datamodel descriptions to reflect sub-day precision

* Share partition selection-mode validation across clear request bodies

BulkDAGRunClearBody and ClearPartitionsBody duplicated the same
"exactly one selection mode" rule, including the partition-window
definition and the selector-enumeration error message, which would
drift independently. Move the shared check onto PartitionSelectorMixin
so the partition-selector semantics live in one place.

* Drop sub-day-precision wording from partition clear CLI help

The "sub-day precision is preserved" phrasing framed the help against a
since-fixed truncation bug, which is meaningless to a reader seeing the
text fresh. The timezone re-interpretation note plus the date-only ->
midnight rule already convey that the time of day is honoured.

* Remove unused resolve_day_bound

* Refactor tests

* regen docs

* Fix test failure

* Fix ruff F402 and docs spelling failing CI on partition-clear branch

A loop variable shadowed the imported `task` decorator (ruff F402) and a
British-spelled word in a new docstring tripped the en_US docs spell-check.

---------
(cherry picked from commit a0805a8)

Co-authored-by: Wei Lee <weilee.rx@gmail.com>
Co-authored-by: Rahul Vats <rah.sharma11@gmail.com>
cetingokhan pushed a commit to cetingokhan/airflow that referenced this pull request Jun 24, 2026
* API: Add partition clear support to REST API to match the CLI

clearDagRuns now accepts partition_key / partition_date window selectors
as an alternative to an explicit run list. Add POST /dags/{dag_id}/clearPartitions
to reset partition_key/partition_date on matching runs, with optional
task-instance clear — REST parity with `airflow dags clear` / `airflow partitions clear`.

* API: Deduplicate partition selector fields across clear request bodies

Extract the shared partition_key / partition_date window fields and their date-order check into a PartitionSelectorMixin reused by BulkDAGRunClearBody and ClearPartitionsBody, and replace the repeated partition-selector presence checks with a has_partition_selectors property. No behavior change.

* Scope partition-clear task instance queries to the target dag

Add a dag_id filter to the task-instance lookups in both the REST clear_partition_fields service and the airflow partitions clear CLI so a run_id shared across dags no longer clears another dag's task instances, and collapse the per-run dry-run task-instance lookups into a single batched count query.

* Share the partition date-window filter across clear paths

Extract the resolve_day_bound partition_date window resolution duplicated across the REST clear_dag_runs route, the clear_partition_fields service, and the airflow partitions clear CLI into a single DagRun.apply_partition_date_window helper so the three cannot drift.

* Share the partition-clear core between the REST API and the CLI

Extract the partition column-reset, task-instance batching, and dry-run counting into a single DagRun.clear_partition_runs helper reused by the clearPartitions REST endpoint and the airflow partitions clear CLI, replacing the two parallel implementations. The CLI keeps its per-run output through an optional callback. No behavior change.

* Fix partition clear commands widening sub-day windows to the whole day

`dags clear` and `partitions clear` passed user-supplied datetimes
through `resolve_day_bound(.date())`, which stripped the time component
and expanded any sub-day bound to local midnight. On an hourly
partitioned Dag, `--partition-date-start 08:00 --partition-date-end
08:00` cleared all 24 partitions instead of just the 08:00 one.

Adds `localize_partition_datetime` to the `Timetable` protocol (base:
UTC pass-through; CronMixin: wall-clock re-interpreted in the
timetable's local timezone, same logic as apache#68718). Removes the
now-redundant private `_localize_wall_clock_to_timetable_timezone` from
`CronPartitionTimetable`. Updates `apply_partition_date_window` to use
the new method with an inclusive `<=` end bound instead of the old
half-open `< next_midnight` form.

* Update REST datamodel descriptions to reflect sub-day precision

* Share partition selection-mode validation across clear request bodies

BulkDAGRunClearBody and ClearPartitionsBody duplicated the same
"exactly one selection mode" rule, including the partition-window
definition and the selector-enumeration error message, which would
drift independently. Move the shared check onto PartitionSelectorMixin
so the partition-selector semantics live in one place.

* Drop sub-day-precision wording from partition clear CLI help

The "sub-day precision is preserved" phrasing framed the help against a
since-fixed truncation bug, which is meaningless to a reader seeing the
text fresh. The timezone re-interpretation note plus the date-only ->
midnight rule already convey that the time of day is honoured.

* Remove unused resolve_day_bound

* Refactor tests

* regen docs

* Fix test failure

* Fix ruff F402 and docs spelling failing CI on partition-clear branch

A loop variable shadowed the imported `task` decorator (ruff F402) and a
British-spelled word in a new docstring tripped the en_US docs spell-check.

---------

Co-authored-by: Rahul Vats <rah.sharma11@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants