From fac34a79102c8b46eb67ec3dd8a625252bba8034 Mon Sep 17 00:00:00 2001 From: Siew Kam Onn Date: Thu, 4 Jun 2026 17:07:25 +0800 Subject: [PATCH 1/2] docs(databricks): document retry_args serialization constraint for deferrable operators Add "Retry args in deferrable mode" subsection under DatabricksSubmitRunDeferrableOperator and DatabricksRunNowDeferrableOperator explaining: - Serialization requirement: only plain Python primitives allowed across the trigger boundary - Supported shapes (int/float primitives, nested plain-dict) - Unsupported shapes (Tenacity objects, callables) with note that a ValueError is raised at task submission - Recommended workaround: use non-deferrable mode for custom retry strategies Also update changelog for 7.16.0. --- providers/databricks/docs/changelog.rst | 1 + .../databricks/docs/operators/run_now.rst | 35 +++++++++++++++++++ .../databricks/docs/operators/submit_run.rst | 35 +++++++++++++++++++ 3 files changed, 71 insertions(+) diff --git a/providers/databricks/docs/changelog.rst b/providers/databricks/docs/changelog.rst index 73e3098a22fa4..98ed6d94c8d50 100644 --- a/providers/databricks/docs/changelog.rst +++ b/providers/databricks/docs/changelog.rst @@ -33,6 +33,7 @@ Features ~~~~~~~~ * ``Fail fast for non-serializable retry_args in deferrable operators and triggers (#64960)`` +* ``Document supported retry_args shapes for deferrable Databricks operators`` * ``Forward Airflow Dag params to Databricks job parameters in CreateJobs/SubmitRun/RunNow (#66613)`` * ``Add session-level query tags to Databricks SQL operators (#66895)`` diff --git a/providers/databricks/docs/operators/run_now.rst b/providers/databricks/docs/operators/run_now.rst index f39d872a0c1f7..cad605bde940d 100644 --- a/providers/databricks/docs/operators/run_now.rst +++ b/providers/databricks/docs/operators/run_now.rst @@ -80,3 +80,38 @@ DatabricksRunNowDeferrableOperator Deferrable version of the :class:`~airflow.providers.databricks.operators.DatabricksRunNowOperator` operator. It allows to utilize Airflow workers more effectively using `new functionality introduced in Airflow 2.2.0 `_ + +.. _howto/operator:DatabricksRunNowDeferrableOperator:retry-args: + +Retry args in deferrable mode +----------------------------- + +When ``deferrable=True``, the ``databricks_retry_args`` dictionary is serialized across the +trigger boundary and must contain only Airflow-serializable values (plain Python primitives +such as ``int``, ``float``, ``str``, ``bool``, ``None``, ``dict``, and ``list``). + +**Supported** (serialization-safe): + +.. code-block:: python + + # Integer / float primitives + databricks_retry_args = {"stop_after_attempt": 3, "wait_fixed": 30} + + # Nested plain-dict form + databricks_retry_args = {"stop": {"type": "stop_after_attempt", "value": 3}} + +**Not supported** in deferrable mode (will raise ``ValueError`` at task submission): + +.. code-block:: python + + from tenacity import stop_after_attempt, wait_incrementing + + # Tenacity strategy objects — NOT serializable + databricks_retry_args = {"stop": stop_after_attempt(3)} + databricks_retry_args = {"wait": wait_incrementing(start=30, increment=30)} + + # Arbitrary callables — NOT serializable + databricks_retry_args = {"retry": my_custom_retry_callable} + +If you need a custom callable retry strategy, use the non-deferrable +:class:`~airflow.providers.databricks.operators.DatabricksRunNowOperator` (``deferrable=False``). diff --git a/providers/databricks/docs/operators/submit_run.rst b/providers/databricks/docs/operators/submit_run.rst index f4f78d2fa3396..5975b1ae55089 100644 --- a/providers/databricks/docs/operators/submit_run.rst +++ b/providers/databricks/docs/operators/submit_run.rst @@ -166,3 +166,38 @@ DatabricksSubmitRunDeferrableOperator Deferrable version of the :class:`~airflow.providers.databricks.operators.DatabricksSubmitRunOperator` operator. It allows to utilize Airflow workers more effectively using `new functionality introduced in Airflow 2.2.0 `_ + +.. _howto/operator:DatabricksSubmitRunDeferrableOperator:retry-args: + +Retry args in deferrable mode +----------------------------- + +When ``deferrable=True``, the ``databricks_retry_args`` dictionary is serialized across the +trigger boundary and must contain only Airflow-serializable values (plain Python primitives +such as ``int``, ``float``, ``str``, ``bool``, ``None``, ``dict``, and ``list``). + +**Supported** (serialization-safe): + +.. code-block:: python + + # Integer / float primitives + databricks_retry_args = {"stop_after_attempt": 3, "wait_fixed": 30} + + # Nested plain-dict form + databricks_retry_args = {"stop": {"type": "stop_after_attempt", "value": 3}} + +**Not supported** in deferrable mode (will raise ``ValueError`` at task submission): + +.. code-block:: python + + from tenacity import stop_after_attempt, wait_incrementing + + # Tenacity strategy objects — NOT serializable + databricks_retry_args = {"stop": stop_after_attempt(3)} + databricks_retry_args = {"wait": wait_incrementing(start=30, increment=30)} + + # Arbitrary callables — NOT serializable + databricks_retry_args = {"retry": my_custom_retry_callable} + +If you need a custom callable retry strategy, use the non-deferrable +:class:`~airflow.providers.databricks.operators.DatabricksSubmitRunOperator` (``deferrable=False``). From 283df3e388fbeae96e78bbdfec820e7ec78fb1b8 Mon Sep 17 00:00:00 2001 From: Siew Kam Onn Date: Thu, 4 Jun 2026 17:40:43 +0800 Subject: [PATCH 2/2] Fix Databricks deferrable retry docs and validation --- generated/provider_dependencies.json.sha256sum | 2 +- providers/databricks/docs/operators/run_now.rst | 15 +++++++++------ .../databricks/docs/operators/submit_run.rst | 13 ++++++++----- 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/generated/provider_dependencies.json.sha256sum b/generated/provider_dependencies.json.sha256sum index 78f8116ea53f4..580078b1623bc 100644 --- a/generated/provider_dependencies.json.sha256sum +++ b/generated/provider_dependencies.json.sha256sum @@ -1 +1 @@ -2d6f34bb40832f84cb6c121237b1c5b0a05181dccface9fd171558f4df1747dc +2ccde55d75b93c7fc2c5723fc7f74bf8995244606190c98acf005ea1f39f04ca diff --git a/providers/databricks/docs/operators/run_now.rst b/providers/databricks/docs/operators/run_now.rst index cad605bde940d..d7a887b6a2039 100644 --- a/providers/databricks/docs/operators/run_now.rst +++ b/providers/databricks/docs/operators/run_now.rst @@ -84,21 +84,24 @@ It allows to utilize Airflow workers more effectively using `new functionality i .. _howto/operator:DatabricksRunNowDeferrableOperator:retry-args: Retry args in deferrable mode ------------------------------ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When ``deferrable=True``, the ``databricks_retry_args`` dictionary is serialized across the trigger boundary and must contain only Airflow-serializable values (plain Python primitives such as ``int``, ``float``, ``str``, ``bool``, ``None``, ``dict``, and ``list``). -**Supported** (serialization-safe): +**Supported** (serialization-safe and runtime-valid): .. code-block:: python - # Integer / float primitives - databricks_retry_args = {"stop_after_attempt": 3, "wait_fixed": 30} + # Only plain-primitive Retrying kwarg: reraise + databricks_retry_args = {"reraise": True} - # Nested plain-dict form - databricks_retry_args = {"stop": {"type": "stop_after_attempt", "value": 3}} +For controlling attempt count and delay, prefer the dedicated operator +parameters ``retry_limit`` and ``retry_delay`` rather than +``databricks_retry_args``. Custom tenacity strategy objects (``stop``, +``wait``, ``retry``, ``before``, ``after``, etc.) require tenacity +callable objects, which are not serialization-safe in deferrable mode. **Not supported** in deferrable mode (will raise ``ValueError`` at task submission): diff --git a/providers/databricks/docs/operators/submit_run.rst b/providers/databricks/docs/operators/submit_run.rst index 5975b1ae55089..4ce00b0e95723 100644 --- a/providers/databricks/docs/operators/submit_run.rst +++ b/providers/databricks/docs/operators/submit_run.rst @@ -176,15 +176,18 @@ When ``deferrable=True``, the ``databricks_retry_args`` dictionary is serialized trigger boundary and must contain only Airflow-serializable values (plain Python primitives such as ``int``, ``float``, ``str``, ``bool``, ``None``, ``dict``, and ``list``). -**Supported** (serialization-safe): +**Supported** (serialization-safe and runtime-valid): .. code-block:: python - # Integer / float primitives - databricks_retry_args = {"stop_after_attempt": 3, "wait_fixed": 30} + # Only plain-primitive Retrying kwarg: reraise + databricks_retry_args = {"reraise": True} - # Nested plain-dict form - databricks_retry_args = {"stop": {"type": "stop_after_attempt", "value": 3}} +For controlling attempt count and delay, prefer the dedicated operator +parameters ``retry_limit`` and ``retry_delay`` rather than +``databricks_retry_args``. Custom tenacity strategy objects (``stop``, +``wait``, ``retry``, ``before``, ``after``, etc.) require tenacity +callable objects, which are not serialization-safe in deferrable mode. **Not supported** in deferrable mode (will raise ``ValueError`` at task submission):