-
Notifications
You must be signed in to change notification settings - Fork 17.3k
Introducing object store backend for task and asset store #68283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
amoghrajesh
merged 19 commits into
apache:main
from
astronomer:aip-103-object-storage-backend
Jun 30, 2026
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
4093978
Pass task/asset scopes to serialize methods instead of ti_id/asset_ref
amoghrajesh 58d45d4
Pass task/asset scopes to serialize methods instead of ti_id/asset_ref
amoghrajesh e6a4d01
fixing tests
amoghrajesh 24caae6
Introducing object store backend for task and asset store
amoghrajesh 5d21dfe
comments from wei
amoghrajesh 6af9c5a
Merge branch 'aip-103-serialize-signature-change' into aip-103-object…
amoghrajesh f2f748a
comments from wei
amoghrajesh 5a52c8b
Merge branch 'aip-103-serialize-signature-change' into aip-103-object…
amoghrajesh 1f68b0a
Merge branch 'main' into aip-103-object-storage-backend
amoghrajesh 4605696
fixing tests
amoghrajesh 70717ed
that mega rename
amoghrajesh 9107ac6
mega rename complete
amoghrajesh 5a01c9d
fixing CI
amoghrajesh 1159c98
comments from ian and kaxil
amoghrajesh d24bcdd
comments from tp
amoghrajesh 0461d81
comments from tp
amoghrajesh 534974c
comments from wei
amoghrajesh 7d9bdfd
comments from wei
amoghrajesh 8519f0f
comments from kaxil
amoghrajesh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| .. Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| .. http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| .. Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
|
|
||
| Object Storage State Store Backend | ||
| =================================== | ||
|
|
||
| The default state store backend is :class:`~airflow.state.metastore.MetastoreStateBackend`, which persists | ||
| task and asset state in the Airflow metadata database via the API Server's Execution API. For larger values, | ||
| you may want to store state on object storage directly from the task instead. | ||
|
|
||
| To enable object storage for task and asset state store, set ``state_store_backend`` in the ``[workers]`` | ||
| section to ``airflow.providers.common.io.state_store.backend.StateStoreObjectStorageBackend``, and set | ||
| ``state_store_objectstorage_path`` to the desired base location. The connection id is obtained from the | ||
| user part of the URL, e.g. ``state_store_objectstorage_path = s3://conn_id@mybucket/task-state/``. | ||
|
|
||
| Task state is stored under ``<dag_id>/<run_id>/<task_id>/<map_index>/<key>`` and asset state under | ||
| ``assets/<asset_identifier>/<key>`` beneath the configured base path. | ||
|
|
||
| By default (``state_store_objectstorage_threshold = 0``) all serialized values are offloaded to object storage. | ||
| Set ``state_store_objectstorage_threshold`` to a positive number of bytes to only offload values whose | ||
| serialized size meets or exceeds the threshold, anything smaller are stored in the Airflow metadata database. | ||
|
|
||
| Optionally set ``state_store_objectstorage_compression`` to an fsspec-supported compression algorithm such as | ||
| ``gzip`` or ``snappy`` to compress values before writing. | ||
|
|
||
| The following example stores all task and asset state in S3, compressed with gzip:: | ||
|
|
||
| [workers] | ||
| state_store_backend = airflow.providers.common.io.state_store.backend.StateStoreObjectStorageBackend | ||
|
|
||
| [common.io] | ||
| state_store_objectstorage_path = s3://conn_id@mybucket/task-state/ | ||
| state_store_objectstorage_compression = gzip | ||
|
|
||
| To only offload values larger than 1 MB:: | ||
|
|
||
| [workers] | ||
| state_store_backend = airflow.providers.common.io.state_store.backend.StateStoreObjectStorageBackend | ||
|
|
||
| [common.io] | ||
| state_store_objectstorage_path = s3://conn_id@mybucket/task-state/ | ||
| state_store_objectstorage_threshold = 1048576 | ||
|
|
||
| Using the local filesystem (useful for development):: | ||
|
|
||
| [workers] | ||
| state_store_backend = airflow.providers.common.io.state_store.backend.StateStoreObjectStorageBackend | ||
|
|
||
| [common.io] | ||
| state_store_objectstorage_path = file:///var/airflow/task-state/ | ||
|
|
||
| .. note:: | ||
|
|
||
| Compression requires the relevant library to be installed in your Python environment. | ||
| For example, ``snappy`` requires ``python-snappy``. Gzip and bz2 work out of the box. | ||
|
|
||
| .. note:: | ||
|
|
||
| ``expires_at`` is not enforced by this backend. Values written to object storage persist | ||
| indefinitely until explicitly deleted. Use your object storage provider's lifecycle policies | ||
| (e.g. S3 lifecycle rules, GCS object lifecycle) to automatically expire old state. | ||
|
|
||
| .. note:: | ||
|
|
||
| Task state paths are keyed on ``(dag_id, run_id, task_id, map_index)`` and are stable across | ||
| task retries. This makes this backend suitable for operators that use | ||
| :class:`~airflow.sdk.ResumableJobMixin` to reconnect to external jobs after a retry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
16 changes: 16 additions & 0 deletions
16
providers/common/io/src/airflow/providers/common/io/state_store/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.