Skip to content

Update task lifecycle diagram for missing states#46056

Merged
eladkal merged 8 commits into
apache:mainfrom
pykenny:20240126_bugfix_task_instance_lifecycle_diagram
Mar 20, 2025
Merged

Update task lifecycle diagram for missing states#46056
eladkal merged 8 commits into
apache:mainfrom
pykenny:20240126_bugfix_task_instance_lifecycle_diagram

Conversation

@pykenny

@pykenny pykenny commented Jan 26, 2025

Copy link
Copy Markdown
Contributor

Purpose

Updates task state the Adds 2 missing states ("skipped", "deferred") to the diagram, with more details on state transition conditions.

Details

  • Adds missing states to the diagram ("skipped" and "deferred")
  • Adds missing condition nodes to state transition
  • Adds "trigger" component (which handles deferred tasks) to the diagram
  • Coverts multiple conditional branch in the graph into sequence (multi-staged) of binary conditional branches
  • Split states into three categories: "shared states", "states only for sensors", and "states only for deferrable tasks"
  • Uses different icons to differentiate task states and Airflow components
  • Uses diagrams library for automated plotting and pre-commit integration; rename "task_lifecycle_diagram" to "diagram_task_lifecycle" to comply with pre-commit condition

Related Issues

closes: #40185

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg

boring-cyborg Bot commented Jan 26, 2025

Copy link
Copy Markdown

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@eladkal

eladkal commented Jan 26, 2025

Copy link
Copy Markdown
Contributor

Nice.
note that task can start execution directly from triggerer without going to the worker first
#38674

I do have some problem with just updating the image like that. It doesn't offer the ability to make changes easily.
Ideally we should have a script that generate the image (diagram as code) similar to how we generate the ERD diagram for the database.
we can use https://github.com/mingrammer/diagrams

@pykenny

pykenny commented Jan 27, 2025

Copy link
Copy Markdown
Contributor Author

The original diagram:
image

Diagram after adding triggerer tasks:
image

Question: Does a triggerer task enter "queued" state during its lifecycle, or it's directly passed to the triggerer once it gets into "scheduled" state?

@potiuk

potiuk commented Jan 27, 2025

Copy link
Copy Markdown
Member

Is there a way we can easily get to update that picture when it changes in the future ?

I am a big fan of generating such images from text / code rather than drawing them by hand. We even already have a number of such images generated from the code via pre-commit - using the cool diagrams library:

Python file: https://github.com/apache/airflow/blob/main/docs/apache-airflow/img/diagram_basic_airflow_architecture.py
Resulting diagram: https://github.com/apache/airflow/blob/main/docs/apache-airflow/img/diagram_basic_airflow_architecture.png

And it's enough to re-run the python file to get the new diagram generated based on it.

I somehow don't see a viable path how we can - in the future - easily modify such state diagram if all we have is the .png file or if we need some external tool (what?) to modify the diagram.

Yes it might maybe look nicer when generated by hand, but for me long-term maintainabilty is far more important. Otherwise we will have exactly the same situation that we already experienced - where outdated pictures were basically "lying" to our users about someething - where they were supposed to reflect the reality. This is pretty much inevitable, if we have no easy way to regenerate the images with updated changes - and where drawing those by-hand is not really an option.

WDYT @pykenny ? Maybe you should try to replicate the diagram using similar approach as our architecture diagrams?

@pykenny

pykenny commented Jan 27, 2025

Copy link
Copy Markdown
Contributor Author

Got your point, I see how the project change its way to manage graphs since last year. That will just take longer to complete in my out-of-date dev environment, however (just realized PyCharm Community haven't had uv integration yet, at least not in their stable versions).

(Update) PyCharm starts supporting uv environment from 2024.3.2 release.

@potiuk

potiuk commented Jan 27, 2025

Copy link
Copy Markdown
Member

Got your point, I see how the project change its way to manage graphs since last year. That will just take longer to complete in my out-of-date dev environment, however (just realized PyCharm Community haven't had uv integration yet, at least not in their stable versions).

uv sync -> creates .venv that you can add as existing virtual environment. That's all you need.

@pykenny

pykenny commented Jan 28, 2025

Copy link
Copy Markdown
Contributor Author

I know it better goes into discussion... but diagrams is not included in the installed dependencies (if uv sync --all-extras --all-groups installs all dependencies possible, and of course I can't do uv pip install diagrams to get a makeshift patch now), and pre-commit install extra dependencies in their cache, instead of virtual environment uv's using. Is that designed so on purpose?

(Update) Oh. So is that the cause?

(pyproject.toml)
The doc extras are not available in the released packages. They are only available when you install Airflow from sources in editable installation - i.e. one that you are usually using to contribute to Airflow.

(Update) From uv lockfile, seems that it blocks installing diagrams on MacOS, perhaps because of requirement on Graphviz installation:

    { name = "diagrams", marker = "sys_platform != 'darwin'" },

(Update) As the result, turned to using pip install approach as fallback, then pip install diagrams to add the package into environment (and of course, Graphviz is already installed on the machine).

@potiuk

potiuk commented Jan 29, 2025

Copy link
Copy Markdown
Member

(Update) From uv lockfile, seems that it blocks installing diagrams on MacOS, perhaps because of requirement on Graphviz installation:

Yep. that's likely.

@pykenny pykenny force-pushed the 20240126_bugfix_task_instance_lifecycle_diagram branch from 8eef60a to 42c2d3d Compare January 30, 2025 09:20
@pykenny

pykenny commented Jan 30, 2025

Copy link
Copy Markdown
Contributor Author

Okay, here's the new one with diagrams library:

image

Also let me know if there's any wrong with the workflow described in the graph. For instance:

@potiuk potiuk force-pushed the 20240126_bugfix_task_instance_lifecycle_diagram branch from 9bf9442 to 14344c9 Compare February 17, 2025 22:31
@eladkal eladkal force-pushed the 20240126_bugfix_task_instance_lifecycle_diagram branch from 14344c9 to dbb0224 Compare March 11, 2025 12:57

@eladkal eladkal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eladkal

eladkal commented Mar 11, 2025

Copy link
Copy Markdown
Contributor

@potiuk any further comments?

pykenny added 8 commits March 20, 2025 08:30
Details:
 - Adds missing states to the diagram ("skipped" and
   "deferred")
 - Adds missing condition nodes to state transition
 - Adds "trigger" component to the diagram
 - Coverts multiple conditional branch in the graph into
   sequence (multi-staged) of binary conditional branches
 - Split states into three categories: "shared states",
   "states only for sensors", and "states only for deferrable
   tasks"
 - Uses different icons to differentiate task states and Airflow
   components
Details:
 - Adds conditions and transitions for tasks that only require
   triggerer to execute
Details:
 - Adds icon images to docs/diagrams
 - Adds 'diagram_task_lifecycle_diagram' script to generate
   lifecycle diagram with `diagrams`
 - Initial attempt to build up the graph, without label for
   condition branch edges, and some redundant edges
Details:
 - Adjusts branch label location by padding newline and
   whitespace
 - Uses `Edge` constructor to add label for condition branch edges
 - Removes redundant edges and correct incorrect edge directions
Details:
 - "diagram_task_lifecycle_diagram" > "diagram_task_lifecycle"
@eladkal eladkal force-pushed the 20240126_bugfix_task_instance_lifecycle_diagram branch from dbb0224 to cb96c94 Compare March 20, 2025 06:30
@eladkal

eladkal commented Mar 20, 2025

Copy link
Copy Markdown
Contributor

Rebased to make sure CI is green. I'll merge after

@eladkal eladkal added this to the Airflow 2.10.6 milestone Mar 20, 2025
@eladkal eladkal added the type:doc-only Changelog: Doc Only label Mar 20, 2025
@eladkal eladkal merged commit 5eef85e into apache:main Mar 20, 2025
@boring-cyborg

boring-cyborg Bot commented Mar 20, 2025

Copy link
Copy Markdown

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

agupta01 pushed a commit to agupta01/airflow that referenced this pull request Mar 21, 2025
* Updates task lifecycle diagram for missing states

Details:
 - Adds missing states to the diagram ("skipped" and
   "deferred")
 - Adds missing condition nodes to state transition
 - Adds "trigger" component to the diagram
 - Coverts multiple conditional branch in the graph into
   sequence (multi-staged) of binary conditional branches
 - Split states into three categories: "shared states",
   "states only for sensors", and "states only for deferrable
   tasks"
 - Uses different icons to differentiate task states and Airflow
   components

* Adds transitions for triggerer tasks

Details:
 - Adds conditions and transitions for tasks that only require
   triggerer to execute

* Make cycles look better

* [WIP] Initial attempt for generating diagram

Details:
 - Adds icon images to docs/diagrams
 - Adds 'diagram_task_lifecycle_diagram' script to generate
   lifecycle diagram with `diagrams`
 - Initial attempt to build up the graph, without label for
   condition branch edges, and some redundant edges

* Adjusts branch labels; Adds label to condition branches

Details:
 - Adjusts branch label location by padding newline and
   whitespace
 - Uses `Edge` constructor to add label for condition branch edges
 - Removes redundant edges and correct incorrect edge directions

* Remove original image; Update documentation

* Adds legend cluster

* File renaming

Details:
 - "diagram_task_lifecycle_diagram" > "diagram_task_lifecycle"
shubham-pyc pushed a commit to shubham-pyc/airflow that referenced this pull request Mar 22, 2025
* Updates task lifecycle diagram for missing states

Details:
 - Adds missing states to the diagram ("skipped" and
   "deferred")
 - Adds missing condition nodes to state transition
 - Adds "trigger" component to the diagram
 - Coverts multiple conditional branch in the graph into
   sequence (multi-staged) of binary conditional branches
 - Split states into three categories: "shared states",
   "states only for sensors", and "states only for deferrable
   tasks"
 - Uses different icons to differentiate task states and Airflow
   components

* Adds transitions for triggerer tasks

Details:
 - Adds conditions and transitions for tasks that only require
   triggerer to execute

* Make cycles look better

* [WIP] Initial attempt for generating diagram

Details:
 - Adds icon images to docs/diagrams
 - Adds 'diagram_task_lifecycle_diagram' script to generate
   lifecycle diagram with `diagrams`
 - Initial attempt to build up the graph, without label for
   condition branch edges, and some redundant edges

* Adjusts branch labels; Adds label to condition branches

Details:
 - Adjusts branch label location by padding newline and
   whitespace
 - Uses `Edge` constructor to add label for condition branch edges
 - Removes redundant edges and correct incorrect edge directions

* Remove original image; Update documentation

* Adds legend cluster

* File renaming

Details:
 - "diagram_task_lifecycle_diagram" > "diagram_task_lifecycle"
nailo2c pushed a commit to nailo2c/airflow that referenced this pull request Apr 4, 2025
* Updates task lifecycle diagram for missing states

Details:
 - Adds missing states to the diagram ("skipped" and
   "deferred")
 - Adds missing condition nodes to state transition
 - Adds "trigger" component to the diagram
 - Coverts multiple conditional branch in the graph into
   sequence (multi-staged) of binary conditional branches
 - Split states into three categories: "shared states",
   "states only for sensors", and "states only for deferrable
   tasks"
 - Uses different icons to differentiate task states and Airflow
   components

* Adds transitions for triggerer tasks

Details:
 - Adds conditions and transitions for tasks that only require
   triggerer to execute

* Make cycles look better

* [WIP] Initial attempt for generating diagram

Details:
 - Adds icon images to docs/diagrams
 - Adds 'diagram_task_lifecycle_diagram' script to generate
   lifecycle diagram with `diagrams`
 - Initial attempt to build up the graph, without label for
   condition branch edges, and some redundant edges

* Adjusts branch labels; Adds label to condition branches

Details:
 - Adjusts branch label location by padding newline and
   whitespace
 - Uses `Edge` constructor to add label for condition branch edges
 - Removes redundant edges and correct incorrect edge directions

* Remove original image; Update documentation

* Adds legend cluster

* File renaming

Details:
 - "diagram_task_lifecycle_diagram" > "diagram_task_lifecycle"
@kaxil kaxil modified the milestones: Airflow 2.10.6, Airflow 2.11.0 Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing states in task state lifecycle diagram

4 participants