Skip to content

Make deadline reads and serialization robust to dynamic/malformed intervals#68919

Open
seanghaeli wants to merge 1 commit into
apache:mainfrom
aws-mwaa:feature/deadline-response-serialization
Open

Make deadline reads and serialization robust to dynamic/malformed intervals#68919
seanghaeli wants to merge 1 commit into
apache:mainfrom
aws-mwaa:feature/deadline-response-serialization

Conversation

@seanghaeli

@seanghaeli seanghaeli commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Why

This re-introduces the deadline read/serialization robustness slice of #66608 (fully reverted in #68909)

The motivating bug: DeadlineAlert.interval is a JSON column holding the Airflow-serialized interval (a fixed timedelta or a dynamic VariableInterval), not a plain number. The /ui/dags/{dag_id}/deadlineAlerts response declared interval: float, so any alert with a serialized-dict interval failed Pydantic validation and the endpoint returned 500, breaking the run-page deadline status badge.

What

  • Interval coercion (datamodels/ui/deadline.py): interval becomes float | None with a before validator that returns seconds for a fixed timedelta and None for a dynamic VariableInterval.
  • Sortable keys (routes/ui/deadlines.py): drop interval from order_by — sorting a JSON column sorts by structure, not duration.
  • Deserialization (serialization/decoders.py, serialization/definitions/deadline.py): route by __class_path ahead of reference_type (custom refs may share a builtin's class name); raise a clear error for a reference with no importable __class_path instead of an opaque KeyError.
  • __repr__ safety (models/deadline.py, models/deadline_alert.py): never raise — guard the dagrun relationship (FK can be set while the relationship is None post-cascade-delete) and handle the dict-shaped interval.
  • Prune guard (models/deadline.py): prune_deadlines explicitly excludes ~Deadline.missed so a missed deadline's queued callback is never cascade-deleted.

Tests

test_deadlines.py (interval coercion + order-by rejection), test_deadline_alert.py / test_deadline.py (repr never raises), test_deadline_reference_registry.py (__class_path precedence + missing-class-path error), test_prune_deadlines.py (missed deadlines survive prune).

Verified locally in Breeze: 113 passed.

Generated-by: Claude Code (Opus via Claude Code) on behalf of Sean Ghaeli

@boring-cyborg boring-cyborg Bot added area:API Airflow's REST/HTTP API area:DAG-processing area:deadline-alerts AIP-86 (former AIP-57) labels Jun 23, 2026
@seanghaeli seanghaeli force-pushed the feature/deadline-response-serialization branch from 12a8e11 to 352702c Compare June 23, 2026 21:30
@seanghaeli seanghaeli marked this pull request as draft June 23, 2026 21:44
…ervals

Hardens the read and (de)serialization paths for deadline alerts so dynamic
(``VariableInterval``) and malformed stored data no longer break the UI/API.

- UI deadline-alert response: ``DeadlineAlert.interval`` is a JSON column holding
  the Airflow-serialized interval, not a plain number. Coerce it to seconds for a
  fixed ``timedelta`` and to ``None`` for a dynamic ``VariableInterval`` (resolved
  later by the scheduler), instead of letting Pydantic 500 on the dict. The
  ``interval`` field becomes ``float | None``.
- Drop ``interval`` from the sortable columns of the deadline-alerts endpoint:
  ordering by a JSON column sorts by structure/text, not duration, so the result
  was arbitrary and misleading.
- Deserialization: route by the encoder-stamped ``__class_path`` ahead of the
  ``reference_type`` name (a custom reference may share a class name with a
  builtin), and raise a clear error for a reference with no importable
  ``__class_path`` instead of an opaque ``KeyError``.
- ``Deadline.__repr__`` / ``DeadlineAlert.__repr__`` no longer raise: guard the
  ``dagrun`` relationship (the FK can be set while the relationship is None after a
  cascade delete) and handle the dict-shaped JSON interval. A ``__repr__`` must
  never raise.
- ``prune_deadlines`` explicitly excludes deadlines already marked ``missed`` so a
  missed deadline (whose callback is owned by the scheduler/triggerer) and its
  queued callback are never cascade-deleted.

Generated-by: Claude Code (Opus via Claude Code) on behalf of Sean Ghaeli
@seanghaeli seanghaeli force-pushed the feature/deadline-response-serialization branch from 352702c to 31bd6d5 Compare June 23, 2026 22:11
@seanghaeli seanghaeli marked this pull request as ready for review June 23, 2026 23:00
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Jun 25, 2026

@pierrejeambrun pierrejeambrun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

I would make the code less verbose and keep only relevant comments/pieces. There are too many big comments some of them needs to be removed completely, some of them needs to be trimmed.

Overall looking good, just a few suggestions.

Another pair of eyes would be great on this.

Comment on lines +76 to +79
Without this coercion Pydantic cannot turn that dict into ``float`` and the
``/ui/dags/{dag_id}/deadlineAlerts`` endpoint raises a 500, which breaks the
run-page deadline status badge. Return the seconds for a fixed interval, or
``None`` for a dynamic one (resolved later by the scheduler).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Without this coercion Pydantic cannot turn that dict into ``float`` and the
``/ui/dags/{dag_id}/deadlineAlerts`` endpoint raises a 500, which breaks the
run-page deadline status badge. Return the seconds for a fixed interval, or
``None`` for a dynamic one (resolved later by the scheduler).
Without this coercion Pydantic cannot turn that dict into ``float`` and the
``/ui/dags/{dag_id}/deadlineAlerts`` endpoint raises a 500. Return the seconds for a fixed interval, or
``None`` for a dynamic one (resolved later by the scheduler).

@@ -0,0 +1 @@
Fixed the deadline UI/API endpoints raising ``500`` errors on deadline alerts whose interval is stored as a serialized object rather than a plain number. ``DeadlineAlert.interval`` is a JSON column holding the Airflow-serialized interval (a fixed ``timedelta`` or a dynamic ``VariableInterval``); the ``/ui/dags/{dag_id}/deadlineAlerts`` response now coerces it to seconds for a fixed interval and to ``null`` for a dynamic one instead of failing to validate. Sorting deadline alerts by ``interval`` (which would sort by JSON structure, not duration) is no longer offered. Deserializing a corrupt or unrecognized deadline reference now raises a clear error instead of an opaque ``KeyError``, custom references that share a class name with a builtin are routed correctly via ``__class_path``, and ``Deadline.__repr__`` no longer raises when the ``dagrun`` relationship has been severed. ``prune_deadlines`` now explicitly skips deadlines already marked ``missed`` so their queued callbacks are never cascade-deleted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove.

@pierrejeambrun pierrejeambrun removed the ready for maintainer review Set after triaging when all criteria pass. label Jun 29, 2026
@pierrejeambrun pierrejeambrun added this to the Airflow 3.3.1 milestone Jun 29, 2026
@pierrejeambrun pierrejeambrun added the type:bug-fix Changelog: Bug Fixes label Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:DAG-processing area:deadline-alerts AIP-86 (former AIP-57) type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants