Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .github/boring-cyborg.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
# specific language governing permissions and limitations
# under the License.
---

# Details: https://github.com/kaxil/boring-cyborg

labelPRBasedOnFilePath:
Expand Down Expand Up @@ -154,8 +153,6 @@ labelPRBasedOnFilePath:
- airflow/example_dags/example_kubernetes_executor.py
- airflow/providers/cncf/kubernetes/**/*
- airflow/providers/celery/executors/celery_kubernetes_executor.py
- docs/apache-airflow/core-concepts/executor/kubernetes.rst
- docs/apache-airflow/core-concepts/executor/celery_kubernetes.rst
- docs/apache-airflow-providers-cncf-kubernetes/**/*
- kubernetes_tests/**/*
- tests/providers/cncf/kubernetes/**/*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,6 @@
specific language governing permissions and limitations
under the License.


.. _executor:CeleryExecutor:

Celery Executor
===============

Expand All @@ -36,7 +33,7 @@ change your ``airflow.cfg`` to point the executor parameter to
For more information about setting up a Celery broker, refer to the
exhaustive `Celery documentation on the topic <https://docs.celeryq.dev/en/latest/getting-started/>`_.

The configuration parameters of the Celery Executor can be found in :doc:`apache-airflow-providers-celery:configurations-ref`.
The configuration parameters of the Celery Executor can be found in :doc:`configurations-ref`.

Here are a few imperative requirements for your workers:

Expand Down Expand Up @@ -97,7 +94,7 @@ Some caveats:
- Tasks can consume resources. Make sure your worker has enough resources to run ``worker_concurrency`` tasks
- Queue names are limited to 256 characters, but each broker backend might have its own restrictions

See :doc:`/administration-and-deployment/modules_management` for details on how Python and Airflow manage modules.
See :doc:`apache-airflow:administration-and-deployment/modules_management` for details on how Python and Airflow manage modules.

Architecture
------------
Expand Down Expand Up @@ -173,7 +170,7 @@ The components communicate with each other in many places
Task execution process
----------------------

.. figure:: ../../img/run_task_on_celery_executor.png
.. figure:: img/run_task_on_celery_executor.png
:scale: 50 %

Sequence diagram - task execution process
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@
under the License.


.. _executor:CeleryKubernetesExecutor:

CeleryKubernetes Executor
=========================

Expand All @@ -36,7 +34,7 @@ An executor is chosen to run a task based on the task's queue.
``CeleryKubernetesExecutor`` inherits the scalability of the ``CeleryExecutor`` to
handle the high load at the peak time and runtime isolation of the ``KubernetesExecutor``.

The configuration parameters of the Celery Executor can be found in :doc:`apache-airflow-providers-celery:configurations-ref`.
The configuration parameters of the Celery Executor can be found in :doc:`configurations-ref`.


When to use CeleryKubernetesExecutor
Expand Down
8 changes: 8 additions & 0 deletions docs/apache-airflow-providers-celery/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,14 @@
Changelog <changelog>
Security <security>

.. toctree::
:hidden:
:maxdepth: 1
:caption: Executors

CeleryExecutor details <celery_executor>
CeleryKubernetesExecutor details <celery_kubernetes_executor>

.. toctree::
:hidden:
:maxdepth: 1
Expand Down
8 changes: 8 additions & 0 deletions docs/apache-airflow-providers-cncf-kubernetes/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,14 @@
Changelog <changelog>
Security <security>

.. toctree::
:hidden:
:maxdepth: 1
:caption: Executors

KubernetesExecutor details <kubernetes_executor>
LocalKubernetesExecutor details <local_kubernetes_executor>

.. toctree::
:hidden:
:maxdepth: 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
under the License.


.. _executor:KubernetesExecutor:
.. _KubernetesExecutor:

Kubernetes Executor
===================
Expand All @@ -38,12 +38,12 @@ KubernetesExecutor requires a non-sqlite database in the backend.

When a DAG submits a task, the KubernetesExecutor requests a worker pod from the Kubernetes API. The worker pod then runs the task, reports the result, and terminates.

.. image:: ../../img/arch-diag-kubernetes.png
.. image:: img/arch-diag-kubernetes.png


One example of an Airflow deployment running on a distributed set of five nodes in a Kubernetes cluster is shown below.

.. image:: ../../img/arch-diag-kubernetes2.png
.. image:: img/arch-diag-kubernetes2.png

Consistent with the regular Airflow architecture, the Workers need access to the DAG files to execute the tasks within those DAGs and interact with the Metadata repository. Also, configuration information specific to the Kubernetes Executor, such as the worker namespace and image information, needs to be specified in the Airflow Configuration file.

Expand All @@ -56,7 +56,7 @@ Additionally, the Kubernetes Executor enables specification of additional featur
.. Airflow_Worker -> Kubernetes: Pod completes with state "Succeeded" and k8s records in ETCD
.. Kubernetes -> Airflow_Scheduler: Airflow scheduler reads "Succeeded" from k8s watcher thread
.. @enduml
.. image:: ../../img/k8s-happy-path.png
.. image:: img/k8s-happy-path.png

Configuration
-------------
Expand Down Expand Up @@ -272,7 +272,7 @@ In the case where a worker dies before it can report its status to the backend D
..
.. @enduml

.. image:: ../../img/k8s-failed-pod.png
.. image:: img/k8s-failed-pod.png


A Kubernetes watcher is a thread that can subscribe to every change that occurs in Kubernetes' database. It is alerted when pods start, run, end, and fail.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
under the License.


.. _executor:LocalKubernetesExecutor:
.. _LocalKubernetesExecutor:

LocalKubernetes Executor
=========================
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ you to create and run Pods on a Kubernetes cluster.
- :ref:`EksPodOperator <howto/operator:EksPodOperator>` operator for `AWS Elastic Kubernetes Engine <https://aws.amazon.com/eks/>`__.

.. note::
The :doc:`Kubernetes executor <apache-airflow:core-concepts/executor/kubernetes>` is **not** required to use this operator.
The :doc:`Kubernetes executor <kubernetes_executor>` is **not** required to use this operator.

How does this operator work?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
53 changes: 53 additions & 0 deletions docs/apache-airflow-providers-openai/dask_executor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.

Dask Executor
=============

:class:`airflow.providers.daskexecutor.executors.dask_executor.DaskExecutor` allows you to run Airflow tasks in a Dask Distributed cluster.

Dask clusters can be run on a single machine or on remote networks. For complete
details, consult the `Distributed documentation <https://distributed.readthedocs.io/>`_.

To create a cluster, first start a Scheduler:

.. code-block:: bash

# default settings for a local cluster
DASK_HOST=127.0.0.1
DASK_PORT=8786

dask-scheduler --host $DASK_HOST --port $DASK_PORT

Next start at least one Worker on any machine that can connect to the host:

.. code-block:: bash

dask-worker $DASK_HOST:$DASK_PORT

Edit your ``airflow.cfg`` to set your executor to :class:`airflow.providers.daskexecutor.executors.dask_executor.DaskExecutor` and provide
the Dask Scheduler address in the ``[dask]`` section. For more information on setting the configuration,
see :doc:`apache-airflow:howto/set-config`.

Please note:

- Each Dask worker must be able to import Airflow and any dependencies you
require.
- The DaskExecutor implements queues using
`Dask Worker Resources <https://distributed.dask.org/en/latest/resources.html>`_ functionality. To enable the use of
queues, start your Dask workers with resources of the same name as the desired queues and a limit of ``inf``.
E.g. ``dask-worker <scheduler_address> --resources="QUEUE1=inf,QUEUE2=inf"``.
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ We maintain an :doc:`official Helm chart <helm-chart:index>` for Airflow that he
Kubernetes Executor
^^^^^^^^^^^^^^^^^^^

The :doc:`Kubernetes Executor </core-concepts/executor/kubernetes>` allows you to run all the Airflow tasks on
The :doc:`Kubernetes Executor <apache-airflow-providers-cncf-kubernetes:kubernetes_executor>` allows you to run all the Airflow tasks on
Kubernetes as separate Pods.

KubernetesPodOperator
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ HTTP monitoring for Celery Cluster

You can optionally use Flower to monitor the health of the Celery cluster. It also provides an HTTP API that you can use to build a health check for your environment.

For details about installation, see: :ref:`executor:CeleryExecutor`. For details about usage, see: `The Flower project documentation <https://flower.readthedocs.io/>`__.
For details about installation, see: :doc:`CeleryExecutor <apache-airflow-providers-celery:celery_executor>`. For details about usage, see: `The Flower project documentation <https://flower.readthedocs.io/>`__.

CLI Check for Celery Workers
----------------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ Airflow uses :class:`~airflow.executors.sequential_executor.SequentialExecutor`
nature, the user is limited to executing at most one task at a time. ``Sequential Executor`` also pauses
the scheduler when it runs a task, hence it is not recommended in a production setup. You should use the
:class:`~airflow.executors.local_executor.LocalExecutor` for a single machine.
For a multi-node setup, you should use the :doc:`Kubernetes executor <../core-concepts/executor/kubernetes>` or
the :doc:`Celery executor <../core-concepts/executor/celery>`.
For a multi-node setup, you should use the :doc:`Kubernetes executor <apache-airflow-providers-cncf-kubernetes:kubernetes_executor>` or
the :doc:`Celery executor <apache-airflow-providers-celery:celery_executor>`.


Once you have configured the executor, it is necessary to make sure that every node in the cluster contains
Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/best-practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ it difficult to check the logs of that Task from the Webserver. If that is not d
Communication
--------------

Airflow executes tasks of a DAG on different servers in case you are using :doc:`Kubernetes executor </core-concepts/executor/kubernetes>` or :doc:`Celery executor </core-concepts/executor/celery>`.
Airflow executes tasks of a DAG on different servers in case you are using :doc:`KubernetesExecutor <apache-airflow-providers-cncf-kubernetes:kubernetes_executor>` or :doc:`Celery executor <apache-airflow-providers-celery:celery_executor>`.
Therefore, you should not store any file or config in the local filesystem as the next task is likely to run on a different server without access to it — for example, a task that downloads the data file that the next task processes.
In the case of :class:`Local executor <airflow.executors.local_executor.LocalExecutor>`,
storing a file on disk can make retries harder e.g., your task requires a config file that is deleted by another task in DAG.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,44 +15,10 @@
specific language governing permissions and limitations
under the License.

.. _executor:DebugExecutor:

Debug Executor (deprecated)
===========================
.. _concepts:debugging:

The :class:`~airflow.executors.debug_executor.DebugExecutor` is meant as
a debug tool and can be used from IDE. It is a single process executor that
queues :class:`~airflow.models.taskinstance.TaskInstance` and executes them by running
``_run_raw_task`` method.

Due to its nature the executor can be used with SQLite database. When used
with sensors the executor will change sensor mode to ``reschedule`` to avoid
blocking the execution of DAG.

Additionally ``DebugExecutor`` can be used in a fail-fast mode that will make
all other running or scheduled tasks fail immediately. To enable this option set
``AIRFLOW__DEBUG__FAIL_FAST=True`` or adjust ``fail_fast`` option in your ``airflow.cfg``.
For more information on setting the configuration, see :doc:`../../howto/set-config`.

**IDE setup steps:**

1. Add ``main`` block at the end of your DAG file to make it runnable.

It will run a backfill job:

.. code-block:: python

if __name__ == "__main__":
from airflow.utils.state import State

dag.clear()
dag.run()


2. Setup ``AIRFLOW__CORE__EXECUTOR=DebugExecutor`` in run configuration of your IDE. In
this step you should also setup all environment variables required by your DAG.

3. Run / debug the DAG file.
Debugging Airflow DAGs
======================

Testing DAGs with dag.test()
*****************************
Expand All @@ -74,7 +40,7 @@ and that's it! You can add argument such as ``execution_date`` if you want to te
you can run or debug DAGs as needed.

Comparison with DebugExecutor
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-----------------------------

The ``dag.test`` command has the following benefits over the :class:`~airflow.executors.debug_executor.DebugExecutor`
class, which is now deprecated:
Expand Down Expand Up @@ -102,3 +68,42 @@ Run ``python -m pdb <path to dag file>.py`` for an interactive debugging experie
-> bash_command='echo 1',
(Pdb) run_this_last
<Task(EmptyOperator): run_this_last>

.. _executor:DebugExecutor:

Debug Executor (deprecated)
***************************

The :class:`~airflow.executors.debug_executor.DebugExecutor` is meant as
a debug tool and can be used from IDE. It is a single process executor that
queues :class:`~airflow.models.taskinstance.TaskInstance` and executes them by running
``_run_raw_task`` method.

Due to its nature the executor can be used with SQLite database. When used
with sensors the executor will change sensor mode to ``reschedule`` to avoid
blocking the execution of DAG.

Additionally ``DebugExecutor`` can be used in a fail-fast mode that will make
all other running or scheduled tasks fail immediately. To enable this option set
``AIRFLOW__DEBUG__FAIL_FAST=True`` or adjust ``fail_fast`` option in your ``airflow.cfg``.
For more information on setting the configuration, see :doc:`../../howto/set-config`.

**IDE setup steps:**

1. Add ``main`` block at the end of your DAG file to make it runnable.

It will run a backfill job:

.. code-block:: python

if __name__ == "__main__":
from airflow.utils.state import State

dag.clear()
dag.run()


2. Setup ``AIRFLOW__CORE__EXECUTOR=DebugExecutor`` in run configuration of your IDE. In
this step you should also setup all environment variables required by your DAG.

3. Run / debug the DAG file.
16 changes: 8 additions & 8 deletions docs/apache-airflow/core-concepts/executor/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,19 +52,19 @@ There are two types of executors - those that run tasks *locally* (inside the ``
.. toctree::
:maxdepth: 1

debug
local
sequential

**Remote Executors**
The DebugExecutor also exists but has been deprecated in favor of using the ``dag.test()`` command for :doc:`debugging <../debug>`.

.. toctree::
:maxdepth: 1

celery
celery_kubernetes
kubernetes
local_kubernetes
**Remote Executors**

* :doc:`CeleryExecutor <apache-airflow-providers-celery:celery_executor>`
* :doc:`CeleryKubernetesExecutor <apache-airflow-providers-celery:celery_kubernetes_executor>`
* :doc:`DaskExecutor <apache-airflow-providers-daskexecutor:dask_executor>`
* :doc:`KubernetesExecutor <apache-airflow-providers-cncf-kubernetes:kubernetes_executor>`
* :doc:`KubernetesLocalExecutor <apache-airflow-providers-cncf-kubernetes:local_kubernetes_executor>`


.. note::
Expand Down
7 changes: 7 additions & 0 deletions docs/apache-airflow/core-concepts/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,10 @@ Here you can find detailed documentation about each one of the core concepts of
xcoms
variables
params

**Debugging**

.. toctree::
:maxdepth: 1

debug
Loading