Skip to content

perf(scheduler): chunk tuple IN list in _create_dag_runs to bound SQL params#68316

Closed
ingaleniranjan365 wants to merge 1 commit into
apache:mainfrom
ingaleniranjan365:auto-perf/chunk-tuple-in-create-dag-runs-20260610
Closed

perf(scheduler): chunk tuple IN list in _create_dag_runs to bound SQL params#68316
ingaleniranjan365 wants to merge 1 commit into
apache:mainfrom
ingaleniranjan365:auto-perf/chunk-tuple-in-create-dag-runs-20260610

Conversation

@ingaleniranjan365

@ingaleniranjan365 ingaleniranjan365 commented Jun 10, 2026

Copy link
Copy Markdown

Performance issue

File: airflow-core/src/airflow/jobs/scheduler_job_runner.py:2234

On large Airflow deployments (hundreds of DAGs), _create_dag_runs() emits a WHERE (dag_id, logical_date) IN (('dag1', ts1), ..., ('dagN', tsN)) clause with O(N) bind parameters on every scheduler heartbeat (default: every 5 seconds). PostgreSQL's query planner re-plans this from scratch each time, causing perceived scheduler "hanging" — the root cause reported in #61453 (26 👍).

The same pattern was already fixed for a neighbouring call site in PR #62114 (merged 2026-03-13). This PR fixes the remaining location.

Fix

Replace the single unbounded tuple_.in_() with a chunked loop of ≤1000-row batches so the SQL parameter count is bounded regardless of fleet size. Semantics are identical — the results are combined into the same existing_dagruns dict before use.

Evidence

Before: 1 query with O(N) bind parameters (N = number of scheduled DAGs) — re-planned by Postgres on every 5s heartbeat.
After: ⌈N/1000⌉ queries each with ≤2000 bind parameters — stable query plan, no planner thrashing.

Validation

  • Test harness: pytest airflow-core/tests/unit/jobs/test_scheduler_job.py -k test_create_dag_runs
  • Tests pass after fix: ✅ (7/7 passed, SQLite in-memory)
  • Fix scope: domain-free, independent, 1 file / +12 -10 lines

Fixes part of: #61453

… params

On large Airflow deployments (hundreds of DAGs), _create_dag_runs() emits
WHERE (dag_id, logical_date) IN (('dag1', ts1), ..., ('dagN', tsN)) with
O(N) bind parameters on every scheduler heartbeat (default: 5s interval).
PostgreSQL re-plans this from scratch each time -- causing perceived scheduler
hanging reported in apache#61453 (26 thumbsup).

Replace the single unbounded IN with a chunked loop of at most 1000-row
batches so the parameter count is bounded regardless of fleet size. The same
pattern was fixed at a neighbouring call site in PR apache#62114; this finishes it.

Fixes part of: apache#61453

Co-authored-by: Wibey VSCode Extension <wibey@walmart.com>
@boring-cyborg

boring-cyborg Bot commented Jun 10, 2026

Copy link
Copy Markdown

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example Dag that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant