Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
03005d5
Fetch deadline callback context via Execution API at runtime
Jun 3, 2026
dbe8e66
Retrigger CI
Jun 3, 2026
a25a2e2
Retrigger CI after infra timeouts (kind/gradle download 504)
Jun 8, 2026
878eb0f
Add Jinja template rendering for deadline callback kwargs
Jun 6, 2026
f348575
Harden deadline callback execution against edge-case failures
Jun 15, 2026
ba943f3
Fix scheduled DagRuns never getting a deadline (UNEXPECTED COMMIT und…
Jun 16, 2026
69a85db
Render Jinja in async deadline-callback kwargs (match sync path)
Jun 17, 2026
e028ab9
Reformat: collapse wrapped isinstance to satisfy ruff-format
Jun 17, 2026
65ed4be
Address review: shared Jinja render helper for both paths + update docs
Jun 17, 2026
8cc7339
Address review: assemble context['deadline'] inside build_context_fro…
Jun 17, 2026
3dfdcc6
Retry executor deadline callbacks on transient context-fetch failures
Jun 17, 2026
cfee847
Trim deadline-callback tests to PR scope
Jun 17, 2026
f4d2787
Fix static checks: SDK-import noqa + coerce_to_timedelta typing
Jun 17, 2026
e55b832
Use fixed datetime instead of datetime.now() in deadline isolation tests
Jun 17, 2026
98193ce
Keep deadline test dates within MySQL TIMESTAMP range
Jun 17, 2026
0d81914
Drop XCom route changes — they belong to #66611, not this PR
Jun 17, 2026
e6813f7
Trim deadline-callback tests substantially toward Airflow's coverage …
Jun 17, 2026
7d96957
Re-trigger CI (transient boto3 inventory 403 in unrelated amazon-prov…
Jun 17, 2026
52e2421
Resolve VariableInterval through full secrets chain, not table-only
Jun 20, 2026
18d7f67
Re-trigger CI (flaky example_trigger_controller_dag e2e: dag_run uniq…
Jun 22, 2026
c67b2c2
Fix rebase fallout: guard trigger.callback access in _create_workload
Jun 22, 2026
ae4941a
Re-trigger CI
Jun 22, 2026
c99bfb5
Re-trigger CI
Jun 22, 2026
9920112
Override require_auth in in-process Execution API to fix test hangs
Jun 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 18 additions & 9 deletions airflow-core/docs/howto/deadline-alerts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Below is an example Dag implementation. If the Dag has not finished 15 minutes a

.. code-block:: python

from datetime import datetime, timedelta
from datetime import datetime, timedelta, timezone
from airflow.sdk import AsyncCallback, DAG, DeadlineAlert, DeadlineReference
from airflow.providers.slack.notifications.slack_webhook import SlackWebhookNotifier
from airflow.providers.standard.operators.empty import EmptyOperator
Expand Down Expand Up @@ -165,7 +165,9 @@ Here's an example using a fixed datetime:

.. code-block:: python

tomorrow_at_ten = datetime.combine(datetime.now().date() + timedelta(days=1), time(10, 0))
tomorrow_at_ten = datetime.combine(
datetime.now().date() + timedelta(days=1), time(10, 0), tzinfo=timezone.utc
)

with DAG(
dag_id="fixed_deadline_alert",
Expand Down Expand Up @@ -365,12 +367,19 @@ A **custom asynchronous callback** might look like this:
Templating and Context
^^^^^^^^^^^^^^^^^^^^^^

Currently, a relatively simple version of the Airflow context is passed to callables and Airflow does not run
:ref:`concepts:jinja-templating` on the kwargs. However, Notifiers already run templating with the
provided context as part of their execution. This means that templating can be used when using a Notifier
as long as the variables being templated are included in the simplified context. This currently includes the
ID and the calculated deadline time of the Deadline Alert as well as the data included in the ``GET`` REST API
response for Dag Run. Support for more comprehensive context and templating will be added in future versions.
A relatively simple version of the Airflow context is passed to callables, and Airflow runs
:ref:`concepts:jinja-templating` on string-valued callback ``kwargs`` using that context. String
kwargs that contain ``{{ ... }}`` are rendered before the callback runs; non-string kwargs and
strings without template markers are passed through untouched, and a template that fails to render
falls back to its raw value (logged at warning) rather than failing the callback. Templating works
identically on both the synchronous (executor) and asynchronous (triggerer) callback paths.

The variables available for templating are those in the simplified context: the ID and the
calculated deadline time of the Deadline Alert (``{{ deadline.id }}``, ``{{ deadline.deadline_time }}``),
plus the Dag Run fields included in the ``GET`` REST API response for Dag Run (e.g.
``{{ dag_run.run_id }}``, ``{{ run_id }}``, ``{{ logical_date }}``, ``{{ ds }}``, ``{{ ts }}``).
Notifiers continue to run their own templating as part of their execution. Support for a more
comprehensive context will be added in future versions.

Deadline Calculation
^^^^^^^^^^^^^^^^^^^^
Expand All @@ -383,7 +392,7 @@ In the following examples, ``notify_team`` is either a SyncCallback or AsyncCall

.. code-block:: python

next_meeting = datetime(2025, 6, 26, 9, 30)
next_meeting = datetime(2025, 6, 26, 9, 30, tzinfo=timezone.utc)

DeadlineAlert(
reference=DeadlineReference.FIXED_DATETIME(next_meeting),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@

from collections.abc import Iterable
from datetime import datetime
from typing import Any
from uuid import UUID

from pydantic import AliasPath, Field
from pydantic import AliasPath, Field, field_validator

from airflow.api_fastapi.core_api.base import BaseModel

Expand Down Expand Up @@ -52,9 +53,42 @@ class DeadlineAlertResponse(BaseModel):
id: UUID
name: str | None = None
reference_type: str = Field(validation_alias=AliasPath("reference", "reference_type"))
interval: float = Field(description="Interval in seconds between deadline evaluations.")
interval: float | None = Field(
default=None,
description=(
"Interval in seconds between the reference time and the deadline. "
"Null for a dynamic interval (e.g. a VariableInterval) whose value is "
"only resolved at scheduler evaluation time."
),
)
created_at: datetime

@field_validator("interval", mode="before")
@classmethod
def coerce_interval_to_seconds(cls, value: Any) -> float | None:
"""
Coerce the stored ``interval`` into seconds.

``DeadlineAlert.interval`` is a JSON column holding the Airflow-serialized form
of the SDK interval, not a plain number. A fixed ``timedelta`` serializes to
``{"__classname__": "datetime.timedelta", "__data__": <seconds>}`` and a dynamic
``VariableInterval`` to ``{"__classname__": ".../VariableInterval", "__data__": {...}}``.
Without this coercion Pydantic cannot turn that dict into ``float`` and the
``/ui/dags/{dag_id}/deadlineAlerts`` endpoint raises a 500, which breaks the
run-page deadline status badge. Return the seconds for a fixed interval, or
``None`` for a dynamic one (resolved later by the scheduler).
"""
if value is None or isinstance(value, (int, float)):
return value
if isinstance(value, dict):
data = value.get("__data__")
# Fixed timedelta: __data__ is the total seconds as a number.
if isinstance(data, (int, float)):
return float(data)
# Dynamic interval (e.g. VariableInterval): no fixed seconds to report.
return None
return None


class DeadlineAlertCollectionResponse(BaseModel):
"""DeadlineAlert Collection serializer for responses."""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -894,13 +894,12 @@ paths:
type: string
description: 'Attributes to order by, multi criteria sort is supported.
Prefix with `-` for descending order. Supported attributes: `id, created_at,
name, interval`'
name`'
default:
- created_at
title: Order By
description: 'Attributes to order by, multi criteria sort is supported. Prefix
with `-` for descending order. Supported attributes: `id, created_at, name,
interval`'
with `-` for descending order. Supported attributes: `id, created_at, name`'
responses:
'200':
description: Successful Response
Expand Down Expand Up @@ -2515,9 +2514,13 @@ components:
type: string
title: Reference Type
interval:
type: number
anyOf:
- type: number
- type: 'null'
title: Interval
description: Interval in seconds between deadline evaluations.
description: Interval in seconds between the reference time and the deadline.
Null for a dynamic interval (e.g. a VariableInterval) whose value is only
resolved at scheduler evaluation time.
created_at:
type: string
format: date-time
Expand All @@ -2526,7 +2529,6 @@ components:
required:
- id
- reference_type
- interval
- created_at
title: DeadlineAlertResponse
description: DeadlineAlert serializer for responses.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -165,8 +165,16 @@ def get_dag_deadline_alerts(
order_by: Annotated[
SortParam,
Depends(
# NOTE: ``interval`` is intentionally NOT a sortable key. ``DeadlineAlert.interval`` is a
# JSON column holding the Airflow-serialized interval — a dict such as
# ``{"__classname__": "datetime.timedelta", "__data__": 300.0}`` for a fixed interval, or a
# structurally different dict for a ``VariableInterval``. Ordering by it at the DB level
# sorts by the JSON text/structure, not the duration, so the result is arbitrary and
# misleading (e.g. a dynamic VariableInterval sorts before/after fixed intervals by shape,
# and "300" vs "3600" compare lexicographically). Meaningful sorting would need a computed
# seconds column. Allow only columns that sort correctly.
SortParam(
["id", "created_at", "name", "interval"],
["id", "created_at", "name"],
DeadlineAlert,
).dynamic_depends(default="created_at")
),
Expand Down
3 changes: 2 additions & 1 deletion airflow-core/src/airflow/api_fastapi/execution_api/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,7 @@ def app(self):
from airflow.api_fastapi.execution_api.routes.connections import has_connection_access
from airflow.api_fastapi.execution_api.routes.variables import has_variable_access
from airflow.api_fastapi.execution_api.routes.xcoms import has_xcom_access
from airflow.api_fastapi.execution_api.security import _jwt_bearer
from airflow.api_fastapi.execution_api.security import _jwt_bearer, require_auth

self._app = create_task_execution_api_app()

Expand All @@ -403,6 +403,7 @@ async def always_allow(request: Request):
return TIToken(id=ti_id, claims=claims)

self._app.dependency_overrides[_jwt_bearer] = always_allow
self._app.dependency_overrides[require_auth] = always_allow
self._app.dependency_overrides[has_connection_access] = always_allow
self._app.dependency_overrides[has_variable_access] = always_allow
self._app.dependency_overrides[has_xcom_access] = always_allow
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,15 @@
import logging
from typing import Annotated

from fastapi import APIRouter, Depends, HTTPException, Path, status
from fastapi import APIRouter, Depends, HTTPException, Path, Security, status

from airflow.api_fastapi.execution_api.datamodels.connection import ConnectionResponse
from airflow.api_fastapi.execution_api.security import CurrentTIToken, get_team_name_dep
from airflow.api_fastapi.execution_api.security import (
CurrentTIToken,
ExecutionAPIRoute,
get_team_name_dep,
require_auth,
)
from airflow.exceptions import AirflowNotFoundException
from airflow.models.connection import Connection

Expand All @@ -49,15 +54,19 @@ async def has_connection_access(


router = APIRouter(
route_class=ExecutionAPIRoute,
responses={status.HTTP_404_NOT_FOUND: {"description": "Connection not found"}},
dependencies=[Depends(has_connection_access)],
)

log = logging.getLogger(__name__)


@router.get(
"/{connection_id}",
dependencies=[
Security(require_auth, scopes=["token:execution", "token:workload"]),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanghaeli I'm sorry I didn't catch this earlier, but this change is regression on security, so this part at least (if not the entire PR?) needs reverting.

token:workload is essentially used for long-lived tokens (~24hrs) when the TI is in queued state between the executor Queueing the task, where a worker calls the ti /run endpoint to exchange it for a short lived (5-10mins) token that has more permissions.

This primary driver for this change was to make the tokens that are visible via workers (either in the Celery message bus, or in the KE pod spec itself) only useable once (i.e. can't be replayed, handled by the TI state transiation requirements) and for a single thing (just for calling the run endpoint)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the right fix here? Between this comment and Kaxil's, is there a way to make this work within the existing token types or do we need a new token type entirely, maybe, instead of trying to shoe-horn what he is trying to do into a system that wasn't built for it?

Also, I clearly need to brush up on the tokens and their intended uses. I didn't know about that intentional down-scoping.

@ferruzzi ferruzzi Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe this is a hint from the universe that we need tests around this to prevent accidental (or at least un-discussed) token scope changes? Let me see what I can come up with and I'll tag you in it for a review.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a new token for this kind of exchange @ashb? Also the code/functionality that should invalidate the workload token after the first use (since we're only intending for it to be used once for the exchange for the short lived token) doesn't seem to be running, otherwise testing would have caught that here? Any who, I think we should circle back on this one and regroup a bit.

Shall we revert this one? @seanghaeli @ferruzzi?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should revert to unblock 3.3.0b2 and we can figure out solution for this meanwhile

Depends(has_connection_access),
],
responses={
status.HTTP_401_UNAUTHORIZED: {"description": "Unauthorized"},
status.HTTP_403_FORBIDDEN: {"description": "Task does not have access to the connection"},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
from typing import Annotated

from cadwyn import VersionedAPIRouter
from fastapi import HTTPException, Query, status
from fastapi import HTTPException, Query, Security, status
from sqlalchemy import func, select
from sqlalchemy.exc import NoResultFound

Expand All @@ -33,15 +33,15 @@
from airflow.api_fastapi.execution_api.datamodels.dagrun import DagRunStateResponse, TriggerDAGRunPayload
from airflow.api_fastapi.execution_api.datamodels.taskinstance import DagRun
from airflow.api_fastapi.execution_api.datamodels.token import TIToken
from airflow.api_fastapi.execution_api.security import CurrentTIToken
from airflow.api_fastapi.execution_api.security import CurrentTIToken, ExecutionAPIRoute, require_auth
from airflow.exceptions import DagNotPartitionedError, DagRunAlreadyExists, InvalidPartitionKeyError
from airflow.models.dag import DagModel
from airflow.models.dagrun import DagRun as DagRunModel
from airflow.models.taskinstance import TaskInstance
from airflow.utils.state import DagRunState
from airflow.utils.types import DagRunTriggeredByType, DagRunType

router = VersionedAPIRouter()
router = VersionedAPIRouter(route_class=ExecutionAPIRoute)

log = logging.getLogger(__name__)

Expand All @@ -66,6 +66,7 @@ def get_previous_dagrun_compat(

@router.get(
"/{dag_id}/{run_id}",
dependencies=[Security(require_auth, scopes=["token:execution", "token:workload"])],
responses={status.HTTP_404_NOT_FOUND: {"description": "Dag run not found"}},
)
def get_dag_run(dag_id: str, run_id: str, session: SessionDep) -> DagRun:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
import logging
from typing import Annotated

from fastapi import APIRouter, Depends, HTTPException, Path, Query, Request, status
from fastapi import APIRouter, Depends, HTTPException, Path, Query, Request, Security, status
from sqlalchemy import func, select

from airflow.api_fastapi.common.db.common import SessionDep
Expand All @@ -29,7 +29,12 @@
VariablePostBody,
VariableResponse,
)
from airflow.api_fastapi.execution_api.security import CurrentTIToken, get_team_name_dep
from airflow.api_fastapi.execution_api.security import (
CurrentTIToken,
ExecutionAPIRoute,
get_team_name_dep,
require_auth,
)
from airflow.models.variable import Variable


Expand Down Expand Up @@ -57,7 +62,7 @@ async def has_variable_access(
return True


router = APIRouter()
router = APIRouter(route_class=ExecutionAPIRoute)

log = logging.getLogger(__name__)

Expand All @@ -68,6 +73,7 @@ async def has_variable_access(
# it requires a variable_key path parameter that /keys does not have.
@router.get(
"/keys",
dependencies=[Security(require_auth, scopes=["token:execution", "token:workload"])],
responses={
status.HTTP_401_UNAUTHORIZED: {"description": "Unauthorized"},
},
Expand Down Expand Up @@ -103,7 +109,10 @@ def get_variable_keys(

@router.get(
"/{variable_key:path}",
dependencies=[Depends(has_variable_access)],
dependencies=[
Security(require_auth, scopes=["token:execution", "token:workload"]),
Depends(has_variable_access),
],
responses={
status.HTTP_404_NOT_FOUND: {"description": "Variable not found"},
status.HTTP_401_UNAUTHORIZED: {"description": "Unauthorized"},
Expand Down Expand Up @@ -131,7 +140,10 @@ def get_variable(

@router.put(
"/{variable_key:path}",
dependencies=[Depends(has_variable_access)],
dependencies=[
Security(require_auth, scopes=["token:execution"]),
Depends(has_variable_access),
],
status_code=status.HTTP_201_CREATED,
responses={
status.HTTP_404_NOT_FOUND: {"description": "Variable not found"},
Expand All @@ -151,7 +163,10 @@ def put_variable(

@router.delete(
"/{variable_key:path}",
dependencies=[Depends(has_variable_access)],
dependencies=[
Security(require_auth, scopes=["token:execution"]),
Depends(has_variable_access),
],
status_code=status.HTTP_204_NO_CONTENT,
responses={
status.HTTP_404_NOT_FOUND: {"description": "Variable not found"},
Expand Down
27 changes: 25 additions & 2 deletions airflow-core/src/airflow/api_fastapi/execution_api/security.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ def __init__(self, *args: Any, **kwargs: Any) -> None:


async def get_team_name_dep(token=CurrentTIToken) -> str | None:
"""Return the team name associated to the task (if any)."""
"""Return the team name associated to the task or callback (if any)."""
from airflow.configuration import conf

if not conf.getboolean("core", "multi_team"):
Expand All @@ -243,7 +243,12 @@ async def get_team_name_dep(token=CurrentTIToken) -> str | None:
from airflow.utils.session import create_session_async

async with create_session_async() as session:
return await session.scalar(_team_name_for_ti_stmt(token.id))
team_name = await session.scalar(_team_name_for_ti_stmt(token.id))
if team_name is not None:
return team_name
# Workload tokens use the callback UUID as sub; fall back to the
# Callback → dag_id → Team path for deadline callback subprocesses.
return await session.scalar(_team_name_for_callback_stmt(token.id))


def get_team_name_for_ti(ti_id, session) -> str | None:
Expand Down Expand Up @@ -289,3 +294,21 @@ def _team_name_for_dag_stmt(dag_id):
.join(DagBundleModel.teams)
.where(DagModel.dag_id == dag_id)
)


def _team_name_for_callback_stmt(callback_id):
"""Build the select statement resolving ``Callback.id -> dag_id -> Team.name``."""
from airflow.models import DagModel
from airflow.models.callback import Callback
from airflow.models.dagbundle import DagBundleModel
from airflow.models.team import Team

# Callbacks store dag_id as a JSON key in data; join via the dag_id value.
return (
select(Team.name)
.select_from(Callback)
.join(DagModel, DagModel.dag_id == Callback.data["dag_id"].as_string())
.join(DagBundleModel, DagBundleModel.name == DagModel.bundle_name)
.join(DagBundleModel.teams)
.where(Callback.id == callback_id)
)
Loading
Loading