diff --git a/providers/common/ai/docs/hooks/index.rst b/providers/common/ai/docs/hooks/index.rst
index 3d05ba8edae13..2786cd1b0ced5 100644
--- a/providers/common/ai/docs/hooks/index.rst
+++ b/providers/common/ai/docs/hooks/index.rst
@@ -40,6 +40,13 @@ Choosing a hook
- Direct LangChain access for tasks that compose ``Runnable``\\s, use the
LangChain agent surface, or need LangChain-native chat / embedding model
objects. Independent of the pydantic-ai-backed operators.
+ * - :class:`~airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook`
+ - Backs the LlamaIndex ``LlamaIndexEmbeddingOperator`` and
+ ``LlamaIndexRetrievalOperator``.
+ Returns LlamaIndex-native ``BaseEmbedding`` / ``LLM`` objects (OpenAI
+ by default). For non-OpenAI vendors, pass a pre-built
+ ``BaseEmbedding`` / ``LLM`` instance straight to the operator and
+ bypass the hook.
Hook guides
-----------
diff --git a/providers/common/ai/docs/hooks/llamaindex.rst b/providers/common/ai/docs/hooks/llamaindex.rst
new file mode 100644
index 0000000000000..2bbd779ed56d1
--- /dev/null
+++ b/providers/common/ai/docs/hooks/llamaindex.rst
@@ -0,0 +1,115 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ .. http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+.. _howto/hook:llamaindex:
+
+``LlamaIndexHook``
+==================
+
+Use :class:`~airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook` to
+bridge an Airflow connection to `LlamaIndex `__
+chat and embedding models. The hook reads credentials (API key, optional
+base URL) from a connection of type ``llamaindex`` and returns native
+LlamaIndex objects ready to pass to ``VectorStoreIndex(..., embed_model=...)``,
+``load_index_from_storage(..., embed_model=...)``, or
+``index.as_retriever(..., llm=...)``.
+
+The hook deliberately does **not** mutate LlamaIndex's global ``Settings``
+singleton. Operators pass the resolved model directly to LlamaIndex
+constructors, so concurrent tasks in the same worker don't race on shared
+state.
+
+OpenAI by default, BYO for other vendors
+----------------------------------------
+
+LlamaIndex does not ship a universal ``init_chat_model`` /
+``init_embedding_model`` equivalent (each vendor is a separate package
+under ``llama-index-llms-*`` / ``llama-index-embeddings-*`` with its own
+constructor kwargs). The hook therefore covers the OpenAI-compatible
+surface that matches LlamaIndex's own ``resolve_embed_model("default")``
+behaviour:
+
+- ``hook.get_embedding_model()`` returns an ``OpenAIEmbedding`` configured
+ from the connection.
+- ``hook.get_llm()`` returns an ``OpenAI`` LLM configured from the
+ connection.
+
+For other vendors (Cohere, Bedrock, Vertex AI, HuggingFace, ...),
+instantiate the LlamaIndex class directly in a ``@task`` and pass it to
+the operator's ``embed_model=`` / ``llm=`` parameter -- both
+:class:`~airflow.providers.common.ai.operators.llamaindex_embedding.LlamaIndexEmbeddingOperator`
+and
+:class:`~airflow.providers.common.ai.operators.llamaindex_retrieval.LlamaIndexRetrievalOperator`
+accept a pre-built ``BaseEmbedding`` / ``LLM`` instance and bypass the
+hook:
+
+.. exampleinclude:: /../../ai/src/airflow/providers/common/ai/example_dags/example_llamaindex_hook.py
+ :language: python
+ :start-after: [START howto_hook_llamaindex_byo_embed_model]
+ :end-before: [END howto_hook_llamaindex_byo_embed_model]
+
+Install the per-vendor LlamaIndex integration package separately:
+``pip install llama-index-embeddings-cohere``, ``...-bedrock``,
+``...-huggingface``, ``llama-index-llms-anthropic``, etc.
+
+Connection Configuration
+------------------------
+
+The hook reads credentials from the Airflow connection of type ``llamaindex``:
+
+- **password** -- API key (passed as ``api_key`` to ``OpenAIEmbedding`` /
+ ``OpenAI``).
+- **host** -- Optional base URL (passed as ``api_base``; useful for custom
+ OpenAI-compatible endpoints, Ollama, vLLM).
+- **extra** JSON --
+ ``{"embed_model": "text-embedding-3-small", "llm_model": "gpt-4o"}`` --
+ default model identifiers stored on the connection.
+
+Parameters
+----------
+
+.. list-table::
+ :header-rows: 1
+ :widths: 25 25 50
+
+ * - Parameter
+ - Default
+ - Description
+ * - ``llm_conn_id``
+ - ``llamaindex_default``
+ - Airflow connection ID for the LLM/embedding provider.
+ * - ``embed_conn_id``
+ - ``None`` (falls back to ``llm_conn_id``)
+ - Optional separate Airflow connection ID for the embedding provider.
+ * - ``embed_model``
+ - ``None`` (falls back to ``extra["embed_model"]``)
+ - Embedding model name, e.g. ``text-embedding-3-small``.
+ * - ``llm_model``
+ - ``None`` (falls back to ``extra["llm_model"]``)
+ - LLM model name, e.g. ``gpt-4o``. Required when calling ``get_llm()``.
+
+Dependencies
+------------
+
+Install the ``llamaindex`` extra::
+
+ pip install apache-airflow-providers-common-ai[llamaindex]
+
+That extra installs ``llama-index-core``, ``llama-index-embeddings-openai``,
+and ``llama-index-llms-openai`` -- enough to back the hook's default
+OpenAI return values. For other LlamaIndex vendor packages, install
+their integration package separately.
diff --git a/providers/common/ai/docs/operators/document_loader.rst b/providers/common/ai/docs/operators/document_loader.rst
index 8a836c37d120e..2aa32e6594dd2 100644
--- a/providers/common/ai/docs/operators/document_loader.rst
+++ b/providers/common/ai/docs/operators/document_loader.rst
@@ -146,7 +146,7 @@ No chunking
The operator parses files into documents; it does **not** split them into
fixed-size chunks. The right chunking strategy depends on the embedding
model and is intentionally left to a downstream text-splitter or embedding
-operator (LlamaIndex's ``EmbeddingOperator``, LangChain's text splitters,
+operator (LlamaIndex's ``LlamaIndexEmbeddingOperator``, LangChain's text splitters,
...).
Format coverage roadmap
@@ -172,7 +172,7 @@ Composing with downstream embedding operators
---------------------------------------------
The output format (``list[dict(text, metadata)]``) is designed to feed
-directly into embedding operators. With LlamaIndex's ``EmbeddingOperator``:
+directly into embedding operators. With LlamaIndex's ``LlamaIndexEmbeddingOperator``:
.. code-block:: python
@@ -181,7 +181,7 @@ directly into embedding operators. With LlamaIndex's ``EmbeddingOperator``:
source_path="/data/docs/*.pdf",
)
- embed = EmbeddingOperator(
+ embed = LlamaIndexEmbeddingOperator(
task_id="embed",
documents="{{ ti.xcom_pull(task_ids='load') }}",
llm_conn_id="openai_default",
diff --git a/providers/common/ai/docs/operators/index.rst b/providers/common/ai/docs/operators/index.rst
index dec108990eee2..7eaf414fe3a63 100644
--- a/providers/common/ai/docs/operators/index.rst
+++ b/providers/common/ai/docs/operators/index.rst
@@ -49,6 +49,12 @@ to pick the one that fits your use case:
* - Parse files (PDF, DOCX, CSV, etc.) into document dicts for embedding
- :class:`~airflow.providers.common.ai.operators.document_loader.DocumentLoaderOperator`
- *(no decorator)*
+ * - Chunk documents and produce embedding vectors
+ - :class:`~airflow.providers.common.ai.operators.llamaindex_embedding.LlamaIndexEmbeddingOperator`
+ - *(no decorator)*
+ * - Retrieve relevant chunks from a vector index
+ - :class:`~airflow.providers.common.ai.operators.llamaindex_retrieval.LlamaIndexRetrievalOperator`
+ - *(no decorator)*
**LLMOperator / @task.llm** — stateless, single-turn calls. Use this for classification,
summarization, extraction, or any prompt that produces one response. Supports structured output
diff --git a/providers/common/ai/docs/operators/llamaindex_embedding.rst b/providers/common/ai/docs/operators/llamaindex_embedding.rst
new file mode 100644
index 0000000000000..99125ac74bdde
--- /dev/null
+++ b/providers/common/ai/docs/operators/llamaindex_embedding.rst
@@ -0,0 +1,119 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ .. http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+.. _howto/operator:llamaindex_embedding:
+
+LlamaIndex ``LlamaIndexEmbeddingOperator``
+==========================================
+
+Chunk a ``list[dict]`` of documents and produce embedding vectors using
+LlamaIndex. Designed to feed the output of
+:class:`~airflow.providers.common.ai.operators.document_loader.DocumentLoaderOperator`
+into vector storage (pgvector, Pinecone, Weaviate, ...).
+
+The operator passes the embedding model **directly** to
+``VectorStoreIndex(..., embed_model=...)`` -- it does not mutate
+LlamaIndex's global ``Settings`` singleton, so concurrent tasks in the same
+worker process don't race on shared model state.
+
+Basic usage
+-----------
+
+.. exampleinclude:: /../../ai/src/airflow/providers/common/ai/example_dags/example_llamaindex_hook.py
+ :language: python
+ :start-after: [START howto_hook_llamaindex_embed]
+ :end-before: [END howto_hook_llamaindex_embed]
+
+``documents`` is templated, so ``loader.output`` (XCom direct) is resolved
+to a native ``list[dict]`` before ``execute`` runs.
+
+Bring-your-own embedding model
+------------------------------
+
+LlamaIndex doesn't ship a universal embedding-model initializer, so the
+operator's ``embed_model`` parameter accepts either:
+
+* a string model name (e.g. ``"text-embedding-3-small"``) -- the operator
+ constructs an ``OpenAIEmbedding`` via
+ :class:`~airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook`
+ using ``llm_conn_id`` / ``embed_conn_id``, or
+* a pre-built ``BaseEmbedding`` instance -- bypass the hook entirely. Use
+ this for Cohere, Bedrock, Vertex, HuggingFace, etc.:
+
+.. exampleinclude:: /../../ai/src/airflow/providers/common/ai/example_dags/example_llamaindex_hook.py
+ :language: python
+ :start-after: [START howto_hook_llamaindex_byo_embed_model]
+ :end-before: [END howto_hook_llamaindex_byo_embed_model]
+
+Persisting to cloud storage
+---------------------------
+
+``persist_dir`` accepts local paths and storage URIs (``s3://``, ``gs://``,
+``azure://``, ``file://``) resolved via
+:class:`~airflow.sdk.ObjectStoragePath`. Pass ``persist_conn_id`` to
+point at the Airflow connection that holds the cloud credentials:
+
+.. exampleinclude:: /../../ai/src/airflow/providers/common/ai/example_dags/example_llamaindex_hook.py
+ :language: python
+ :start-after: [START howto_hook_llamaindex_cloud_persist]
+ :end-before: [END howto_hook_llamaindex_cloud_persist]
+
+Parameters
+----------
+
+.. list-table::
+ :header-rows: 1
+ :widths: 25 75
+
+ * - Parameter
+ - Description
+ * - ``documents``
+ - ``list[dict]`` with ``text`` / ``metadata`` keys. Templated, so
+ binding ``loader.output`` resolves to the native list before
+ execute.
+ * - ``embed_model``
+ - String model name OR pre-built ``BaseEmbedding`` instance.
+ * - ``llm_conn_id``
+ - Airflow connection ID used when ``embed_model`` is a string. Falls
+ back to ``LlamaIndexHook.default_conn_name`` (``llamaindex_default``)
+ when ``None``.
+ * - ``embed_conn_id``
+ - Optional separate connection ID for the embedding provider. Falls
+ back to ``llm_conn_id`` when ``None``.
+ * - ``chunk_size``
+ - Sentence-splitter chunk size (default 512).
+ * - ``chunk_overlap``
+ - Overlap between chunks (default 50).
+ * - ``persist_dir``
+ - Local path or storage URI to persist the LlamaIndex index.
+ * - ``persist_conn_id``
+ - Cloud credentials connection ID for ``persist_dir`` URIs.
+
+Output
+------
+
+Returns a dict with::
+
+ {
+ "document_count": int,
+ "chunk_count": int,
+ "persist_dir": str | None,
+ "chunks": [
+ {"text": str, "metadata": dict, "vector": list[float]},
+ ...
+ ],
+ }
diff --git a/providers/common/ai/docs/operators/llamaindex_retrieval.rst b/providers/common/ai/docs/operators/llamaindex_retrieval.rst
new file mode 100644
index 0000000000000..6e0793604abab
--- /dev/null
+++ b/providers/common/ai/docs/operators/llamaindex_retrieval.rst
@@ -0,0 +1,109 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ .. http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+.. _howto/operator:llamaindex_retrieval:
+
+LlamaIndex ``LlamaIndexRetrievalOperator``
+==========================================
+
+Load a persisted LlamaIndex index and run similarity search. Designed to
+sit between
+:class:`~airflow.providers.common.ai.operators.llamaindex_embedding.LlamaIndexEmbeddingOperator`
+(which builds the index) and
+:class:`~airflow.providers.common.ai.operators.llm.LLMOperator` (which
+synthesises an answer from the retrieved chunks).
+
+Passes the embedding model **directly** to
+``load_index_from_storage(..., embed_model=...)`` -- no LlamaIndex
+``Settings`` mutation. The embedding model must match the one used when
+the index was originally built.
+
+Basic usage
+-----------
+
+.. exampleinclude:: /../../ai/src/airflow/providers/common/ai/example_dags/example_llamaindex_hook.py
+ :language: python
+ :start-after: [START howto_hook_llamaindex_retrieve]
+ :end-before: [END howto_hook_llamaindex_retrieve]
+
+``query`` is templated, so DAG-run params, XCom, and Variables all flow
+through cleanly.
+
+Cloud-persisted indexes
+-----------------------
+
+``index_persist_dir`` accepts the same local-path-or-URI shape as
+``LlamaIndexEmbeddingOperator.persist_dir``. Pass ``persist_conn_id`` to point at
+the Airflow connection that holds cloud credentials. The operator raises
+``FileNotFoundError`` with a clear "did you run LlamaIndexEmbeddingOperator first?"
+message when the path is missing.
+
+Bring-your-own embedding model
+------------------------------
+
+Same shape as ``LlamaIndexEmbeddingOperator``: ``embed_model`` accepts either a
+string model name (OpenAI via the hook) or a pre-built ``BaseEmbedding``
+instance for non-OpenAI vendors. See the BYO example in
+:doc:`llamaindex_embedding`.
+
+Parameters
+----------
+
+.. list-table::
+ :header-rows: 1
+ :widths: 25 75
+
+ * - Parameter
+ - Description
+ * - ``query``
+ - The query string. Templated.
+ * - ``index_persist_dir``
+ - Local path or storage URI pointing at the persisted index.
+ Templated.
+ * - ``persist_conn_id``
+ - Cloud credentials connection ID for ``index_persist_dir`` URIs.
+ Templated.
+ * - ``embed_model``
+ - String model name OR pre-built ``BaseEmbedding`` instance. Must
+ match the model used when the index was built. Templated.
+ * - ``llm_conn_id``
+ - Airflow connection ID used when ``embed_model`` is a string. Falls
+ back to ``LlamaIndexHook.default_conn_name`` (``llamaindex_default``)
+ when ``None``.
+ * - ``embed_conn_id``
+ - Optional separate connection ID for the embedding provider. Falls
+ back to ``llm_conn_id`` when ``None``.
+ * - ``top_k``
+ - Number of top similarity results to return (default 5).
+
+Output
+------
+
+Returns a dict with::
+
+ {
+ "query": str,
+ "chunks": [
+ {
+ "text": str,
+ "score": float,
+ "metadata": dict,
+ "node_id": str,
+ },
+ ...
+ ],
+ }
diff --git a/providers/common/ai/provider.yaml b/providers/common/ai/provider.yaml
index 92d826ffb0a89..d95ff608857dd 100644
--- a/providers/common/ai/provider.yaml
+++ b/providers/common/ai/provider.yaml
@@ -53,6 +53,12 @@ integrations:
- integration-name: LangChain
external-doc-url: https://python.langchain.com/
tags: [ai]
+ - integration-name: LlamaIndex
+ external-doc-url: https://docs.llamaindex.ai/
+ how-to-guide:
+ - /docs/apache-airflow-providers-common-ai/operators/llamaindex_embedding.rst
+ - /docs/apache-airflow-providers-common-ai/operators/llamaindex_retrieval.rst
+ tags: [ai]
hooks:
- integration-name: Pydantic AI
@@ -64,6 +70,9 @@ hooks:
- integration-name: LangChain
python-modules:
- airflow.providers.common.ai.hooks.langchain
+ - integration-name: LlamaIndex
+ python-modules:
+ - airflow.providers.common.ai.hooks.llamaindex
plugins:
- name: hitl_review
@@ -354,6 +363,40 @@ connection-types:
type:
- string
- 'null'
+ - hook-class-name: airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook
+ hook-name: "LlamaIndex"
+ connection-type: llamaindex
+ ui-field-behaviour:
+ hidden-fields:
+ - schema
+ - port
+ - login
+ relabeling:
+ password: API Key
+ placeholders:
+ host: "https://api.openai.com/v1 (optional, for custom endpoints / Ollama)"
+ extra: '{"embed_model": "text-embedding-3-small", "llm_model": "gpt-4o"}'
+ conn-fields:
+ embed_model:
+ label: Embedding Model
+ description: >
+ Default LlamaIndex embedding model name (e.g.
+ text-embedding-3-small). The OpenAI default; for other vendors
+ pass a pre-built BaseEmbedding instance to the operator.
+ schema:
+ type:
+ - string
+ - 'null'
+ llm_model:
+ label: LLM Model
+ description: >
+ Default LlamaIndex LLM model name (e.g. gpt-4o). The OpenAI
+ default; for other vendors pass a pre-built LLM instance to
+ the operator.
+ schema:
+ type:
+ - string
+ - 'null'
operators:
- integration-name: Common AI
@@ -365,6 +408,8 @@ operators:
- airflow.providers.common.ai.operators.llm_sql
- airflow.providers.common.ai.operators.llm_schema_compare
- airflow.providers.common.ai.operators.document_loader
+ - airflow.providers.common.ai.operators.llamaindex_embedding
+ - airflow.providers.common.ai.operators.llamaindex_retrieval
task-decorators:
- class-name: airflow.providers.common.ai.decorators.agent.agent_task
diff --git a/providers/common/ai/pyproject.toml b/providers/common/ai/pyproject.toml
index 18833cd64bc1a..c67f16b319bb7 100644
--- a/providers/common/ai/pyproject.toml
+++ b/providers/common/ai/pyproject.toml
@@ -98,6 +98,11 @@ dependencies = [
"langchain" = [
"langchain>=1.0.0",
]
+"llamaindex" = [
+ "llama-index-core>=0.13.0",
+ "llama-index-embeddings-openai>=0.6.0",
+ "llama-index-llms-openai>=0.6.0",
+]
"pdf" = ["pypdf>=4.0.0"]
"docx" = ["python-docx>=1.0.0"]
@@ -114,6 +119,9 @@ dev = [
"pydantic-ai-slim[mcp]",
"apache-airflow-providers-common-sql[datafusion]",
"langchain>=1.0.0",
+ "llama-index-core>=0.13.0",
+ "llama-index-embeddings-openai>=0.6.0",
+ "llama-index-llms-openai>=0.6.0",
]
# To build docs:
diff --git a/providers/common/ai/src/airflow/providers/common/ai/example_dags/example_llamaindex_hook.py b/providers/common/ai/src/airflow/providers/common/ai/example_dags/example_llamaindex_hook.py
new file mode 100644
index 0000000000000..f089ee02b4d08
--- /dev/null
+++ b/providers/common/ai/src/airflow/providers/common/ai/example_dags/example_llamaindex_hook.py
@@ -0,0 +1,147 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Example DAGs demonstrating LlamaIndexHook + LlamaIndex operator usage.
+
+Each DAG covers a single pattern. The docs reference these via
+``.. exampleinclude::`` so the runnable snippets stay in sync.
+"""
+
+from __future__ import annotations
+
+from airflow.providers.common.ai.operators.document_loader import DocumentLoaderOperator
+from airflow.providers.common.ai.operators.llamaindex_embedding import LlamaIndexEmbeddingOperator
+from airflow.providers.common.ai.operators.llamaindex_retrieval import LlamaIndexRetrievalOperator
+from airflow.providers.common.compat.sdk import dag, task
+
+
+# [START howto_hook_llamaindex_embed]
+@dag(schedule=None)
+def example_llamaindex_embed():
+ """Chunk + embed a directory of documents and persist the index locally."""
+
+ load = DocumentLoaderOperator(
+ task_id="load",
+ source_path="/opt/airflow/data/library/**/*",
+ file_extensions=[".pdf", ".md", ".txt"],
+ )
+
+ embed = LlamaIndexEmbeddingOperator(
+ task_id="embed",
+ documents=load.output, # XCom direct -- never via Jinja (list[dict])
+ embed_model="text-embedding-3-small",
+ llm_conn_id="llamaindex_default",
+ chunk_size=512,
+ chunk_overlap=50,
+ persist_dir="/opt/airflow/data/library_index",
+ )
+
+ load >> embed
+
+
+# [END howto_hook_llamaindex_embed]
+
+example_llamaindex_embed()
+
+
+# [START howto_hook_llamaindex_retrieve]
+@dag(schedule=None)
+def example_llamaindex_retrieve():
+ """Load a persisted index and run similarity search."""
+
+ retrieve = LlamaIndexRetrievalOperator(
+ task_id="retrieve",
+ query="{{ params.query }}",
+ index_persist_dir="/opt/airflow/data/library_index",
+ embed_model="text-embedding-3-small",
+ llm_conn_id="llamaindex_default",
+ top_k=5,
+ )
+
+ retrieve
+
+
+# [END howto_hook_llamaindex_retrieve]
+
+example_llamaindex_retrieve()
+
+
+# [START howto_hook_llamaindex_cloud_persist]
+@dag(schedule=None)
+def example_llamaindex_cloud_persist():
+ """Persist the index directly to S3 -- no separate upload step."""
+
+ load = DocumentLoaderOperator(
+ task_id="load",
+ source_path="s3://my-bucket/library/",
+ source_conn_id="aws_default",
+ file_extensions=[".pdf"],
+ )
+
+ embed = LlamaIndexEmbeddingOperator(
+ task_id="embed",
+ documents=load.output,
+ embed_model="text-embedding-3-small",
+ llm_conn_id="llamaindex_default",
+ persist_dir="s3://my-bucket/indexes/library/",
+ persist_conn_id="aws_default",
+ )
+
+ load >> embed
+
+
+# [END howto_hook_llamaindex_cloud_persist]
+
+example_llamaindex_cloud_persist()
+
+
+# [START howto_hook_llamaindex_byo_embed_model]
+@dag(schedule=None)
+def example_llamaindex_byo_embed_model():
+ """Use a non-OpenAI embedding by instantiating the LlamaIndex class directly.
+
+ LlamaIndex doesn't ship a universal init helper, so the operator accepts
+ a pre-built ``BaseEmbedding`` instance and bypasses the hook entirely.
+ Install the matching extra:
+ ``pip install llama-index-embeddings-cohere``.
+ """
+
+ @task
+ def build_cohere_embedder():
+ from llama_index.embeddings.cohere import CohereEmbedding
+
+ from airflow.providers.common.compat.sdk import BaseHook
+
+ conn = BaseHook.get_connection("cohere_default")
+ return CohereEmbedding(model_name="embed-english-v3.0", cohere_api_key=conn.password)
+
+ @task
+ def empty_doc_list() -> list[dict]:
+ return [{"text": "Cohere demo content", "metadata": {}}]
+
+ embed = LlamaIndexEmbeddingOperator(
+ task_id="embed",
+ documents=empty_doc_list(),
+ embed_model=build_cohere_embedder(),
+ persist_dir="/opt/airflow/data/cohere_index",
+ )
+
+ embed
+
+
+# [END howto_hook_llamaindex_byo_embed_model]
+
+example_llamaindex_byo_embed_model()
diff --git a/providers/common/ai/src/airflow/providers/common/ai/example_dags/example_llamaindex_rag.py b/providers/common/ai/src/airflow/providers/common/ai/example_dags/example_llamaindex_rag.py
new file mode 100644
index 0000000000000..6c044b965f4d7
--- /dev/null
+++ b/providers/common/ai/src/airflow/providers/common/ai/example_dags/example_llamaindex_rag.py
@@ -0,0 +1,236 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Example DAGs demonstrating RAG pipelines with LlamaIndex operators.
+
+Three patterns:
+
+1. Full RAG pipeline -- load -> embed -> retrieve -> answer in one DAG.
+2. Separate index/query DAGs -- production-shaped split (scheduled
+ indexing job + on-demand query DAG).
+3. Multi-source RAG -- combine multiple loaders with source metadata.
+
+The ``LLMOperator`` synthesis step uses a ``pydanticai_default`` connection
+because :class:`~airflow.providers.common.ai.operators.llm.LLMOperator` is
+pydantic-ai-backed; the LlamaIndex operators use ``llamaindex_default``.
+The two connection types are intentional -- they back different frameworks.
+"""
+
+from __future__ import annotations
+
+from airflow.providers.common.ai.operators.document_loader import DocumentLoaderOperator
+from airflow.providers.common.ai.operators.llamaindex_embedding import LlamaIndexEmbeddingOperator
+from airflow.providers.common.ai.operators.llamaindex_retrieval import LlamaIndexRetrievalOperator
+from airflow.providers.common.ai.operators.llm import LLMOperator
+from airflow.providers.common.compat.sdk import dag, task
+
+# ---------------------------------------------------------------------------
+# 1. Full RAG pipeline: load -> embed -> retrieve -> answer
+# ---------------------------------------------------------------------------
+
+
+# [START howto_llamaindex_rag_pipeline]
+@dag(schedule=None)
+def example_llamaindex_rag_pipeline():
+ """End-to-end RAG pipeline in a single DAG.
+
+ 1. Parse local text files into document dicts.
+ 2. Chunk and embed the documents, persisting the index to disk.
+ 3. Retrieve relevant chunks for a user question.
+ 4. Synthesize an answer using the retrieved context.
+ """
+ load = DocumentLoaderOperator(
+ task_id="load_docs",
+ source_path="/opt/airflow/data/knowledge_base/",
+ file_extensions=[".txt", ".md", ".pdf"],
+ )
+
+ embed = LlamaIndexEmbeddingOperator(
+ task_id="embed_docs",
+ documents=load.output,
+ embed_model="text-embedding-3-small",
+ llm_conn_id="llamaindex_default",
+ chunk_size=512,
+ chunk_overlap=50,
+ persist_dir="/opt/airflow/data/indexes/kb_index",
+ )
+
+ retrieve = LlamaIndexRetrievalOperator(
+ task_id="retrieve",
+ query="What are the main components of Apache Airflow?",
+ index_persist_dir="/opt/airflow/data/indexes/kb_index",
+ embed_model="text-embedding-3-small",
+ llm_conn_id="llamaindex_default",
+ top_k=5,
+ )
+
+ @task
+ def format_context(retrieval_result: dict) -> str:
+ chunks = retrieval_result["chunks"]
+ return "\n\n---\n\n".join(chunk["text"] for chunk in chunks)
+
+ context = format_context(retrieve.output)
+
+ answer = LLMOperator(
+ task_id="answer",
+ prompt=(
+ "Using the context below, answer the question: "
+ "What are the main components of Apache Airflow?\n\n"
+ "Context:\n{{ ti.xcom_pull(task_ids='format_context') }}"
+ ),
+ llm_conn_id="pydanticai_default",
+ system_prompt="Answer based only on the provided context. Cite sources when possible.",
+ )
+
+ embed >> retrieve >> context >> answer
+
+
+# [END howto_llamaindex_rag_pipeline]
+
+example_llamaindex_rag_pipeline()
+
+
+# ---------------------------------------------------------------------------
+# 2. Production-shaped split: scheduled indexing + on-demand query
+# ---------------------------------------------------------------------------
+
+
+# [START howto_llamaindex_index_dag]
+@dag(schedule="@weekly")
+def example_llamaindex_index_pdf():
+ """Weekly indexing DAG -- keep the vector index fresh as PDFs arrive.
+
+ The companion query DAG (below) reads the persisted index on demand.
+ """
+ load = DocumentLoaderOperator(
+ task_id="load_pdfs",
+ source_path="/opt/airflow/data/reports/*.pdf",
+ )
+
+ build_index = LlamaIndexEmbeddingOperator(
+ task_id="build_index",
+ documents=load.output,
+ embed_model="text-embedding-3-small",
+ llm_conn_id="llamaindex_default",
+ chunk_size=1024,
+ chunk_overlap=100,
+ persist_dir="/opt/airflow/data/indexes/reports_index",
+ )
+
+ load >> build_index
+
+
+# [END howto_llamaindex_index_dag]
+
+example_llamaindex_index_pdf()
+
+
+# [START howto_llamaindex_query_dag]
+@dag(
+ schedule=None,
+ params={"question": "Summarize the key findings from the latest quarterly report."},
+)
+def example_llamaindex_query():
+ """On-demand query DAG -- retrieve from a pre-built index and synthesize.
+
+ Trigger manually or via API with a ``question`` parameter.
+ """
+ retrieve = LlamaIndexRetrievalOperator(
+ task_id="retrieve",
+ query="{{ params.question }}",
+ index_persist_dir="/opt/airflow/data/indexes/reports_index",
+ embed_model="text-embedding-3-small",
+ llm_conn_id="llamaindex_default",
+ top_k=5,
+ )
+
+ @task
+ def format_context(retrieval_result: dict) -> str:
+ chunks = retrieval_result["chunks"]
+ numbered = [f"[{i + 1}] {chunk['text']}" for i, chunk in enumerate(chunks)]
+ return "\n\n".join(numbered)
+
+ context = format_context(retrieve.output)
+
+ synthesize = LLMOperator(
+ task_id="synthesize",
+ prompt=(
+ "Question: {{ params.question }}\n\n"
+ "Relevant excerpts:\n{{ ti.xcom_pull(task_ids='format_context') }}\n\n"
+ "Provide a detailed answer with references to the excerpt numbers."
+ ),
+ llm_conn_id="pydanticai_default",
+ system_prompt=(
+ "You are a research assistant. Answer the question using only the "
+ "provided excerpts. Reference excerpt numbers in square brackets."
+ ),
+ )
+
+ context >> synthesize
+
+
+# [END howto_llamaindex_query_dag]
+
+example_llamaindex_query()
+
+
+# ---------------------------------------------------------------------------
+# 3. Multi-source RAG: combine CSV product data with text documentation
+# ---------------------------------------------------------------------------
+
+
+# [START howto_llamaindex_multi_source]
+@dag(schedule=None)
+def example_llamaindex_multi_source():
+ """Combine multiple loaders with source-tagging metadata.
+
+ Shows how ``DocumentLoaderOperator`` handles different file formats and
+ how ``metadata_fields`` tags documents by source for filtered retrieval
+ downstream.
+ """
+ load_products = DocumentLoaderOperator(
+ task_id="load_products",
+ source_path="/opt/airflow/data/products.csv",
+ metadata_fields={"source": "product_catalog", "department": "engineering"},
+ )
+
+ load_docs = DocumentLoaderOperator(
+ task_id="load_docs",
+ source_path="/opt/airflow/data/documentation/",
+ file_extensions=[".md", ".txt"],
+ metadata_fields={"source": "documentation"},
+ )
+
+ @task
+ def merge_documents(products: list[dict], docs: list[dict]) -> list[dict]:
+ return products + docs
+
+ merged = merge_documents(load_products.output, load_docs.output)
+
+ embed_all = LlamaIndexEmbeddingOperator(
+ task_id="embed_all",
+ documents=merged,
+ embed_model="text-embedding-3-small",
+ llm_conn_id="llamaindex_default",
+ persist_dir="/opt/airflow/data/indexes/multi_source_index",
+ )
+
+ embed_all
+
+
+# [END howto_llamaindex_multi_source]
+
+example_llamaindex_multi_source()
diff --git a/providers/common/ai/src/airflow/providers/common/ai/get_provider_info.py b/providers/common/ai/src/airflow/providers/common/ai/get_provider_info.py
index d87733bb5ffa0..a3642e4895b79 100644
--- a/providers/common/ai/src/airflow/providers/common/ai/get_provider_info.py
+++ b/providers/common/ai/src/airflow/providers/common/ai/get_provider_info.py
@@ -56,6 +56,15 @@ def get_provider_info():
"external-doc-url": "https://python.langchain.com/",
"tags": ["ai"],
},
+ {
+ "integration-name": "LlamaIndex",
+ "external-doc-url": "https://docs.llamaindex.ai/",
+ "how-to-guide": [
+ "/docs/apache-airflow-providers-common-ai/operators/llamaindex_embedding.rst",
+ "/docs/apache-airflow-providers-common-ai/operators/llamaindex_retrieval.rst",
+ ],
+ "tags": ["ai"],
+ },
],
"hooks": [
{
@@ -67,6 +76,10 @@ def get_provider_info():
"integration-name": "LangChain",
"python-modules": ["airflow.providers.common.ai.hooks.langchain"],
},
+ {
+ "integration-name": "LlamaIndex",
+ "python-modules": ["airflow.providers.common.ai.hooks.llamaindex"],
+ },
],
"plugins": [
{
@@ -288,6 +301,31 @@ def get_provider_info():
},
},
},
+ {
+ "hook-class-name": "airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook",
+ "hook-name": "LlamaIndex",
+ "connection-type": "llamaindex",
+ "ui-field-behaviour": {
+ "hidden-fields": ["schema", "port", "login"],
+ "relabeling": {"password": "API Key"},
+ "placeholders": {
+ "host": "https://api.openai.com/v1 (optional, for custom endpoints / Ollama)",
+ "extra": '{"embed_model": "text-embedding-3-small", "llm_model": "gpt-4o"}',
+ },
+ },
+ "conn-fields": {
+ "embed_model": {
+ "label": "Embedding Model",
+ "description": "Default LlamaIndex embedding model name (e.g. text-embedding-3-small). The OpenAI default; for other vendors pass a pre-built BaseEmbedding instance to the operator.\n",
+ "schema": {"type": ["string", "null"]},
+ },
+ "llm_model": {
+ "label": "LLM Model",
+ "description": "Default LlamaIndex LLM model name (e.g. gpt-4o). The OpenAI default; for other vendors pass a pre-built LLM instance to the operator.\n",
+ "schema": {"type": ["string", "null"]},
+ },
+ },
+ },
],
"operators": [
{
@@ -300,6 +338,8 @@ def get_provider_info():
"airflow.providers.common.ai.operators.llm_sql",
"airflow.providers.common.ai.operators.llm_schema_compare",
"airflow.providers.common.ai.operators.document_loader",
+ "airflow.providers.common.ai.operators.llamaindex_embedding",
+ "airflow.providers.common.ai.operators.llamaindex_retrieval",
],
}
],
diff --git a/providers/common/ai/src/airflow/providers/common/ai/hooks/llamaindex.py b/providers/common/ai/src/airflow/providers/common/ai/hooks/llamaindex.py
new file mode 100644
index 0000000000000..05e002d86425e
--- /dev/null
+++ b/providers/common/ai/src/airflow/providers/common/ai/hooks/llamaindex.py
@@ -0,0 +1,189 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Hook for LlamaIndex integration with Airflow connections."""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any
+
+from airflow.providers.common.compat.sdk import (
+ AirflowOptionalProviderFeatureException,
+ BaseHook,
+)
+
+if TYPE_CHECKING:
+ from llama_index.core.base.embeddings.base import BaseEmbedding
+ from llama_index.core.llms.llm import LLM
+
+
+class LlamaIndexHook(BaseHook):
+ """
+ Bridge an Airflow connection to LlamaIndex chat and embedding models.
+
+ The hook resolves credentials (API key, optional API base URL) from the
+ Airflow connection and returns native LlamaIndex objects ready to pass
+ to ``VectorStoreIndex(..., embed_model=...)``,
+ ``load_index_from_storage(..., embed_model=...)``, or
+ ``index.as_retriever(..., llm=...)``.
+
+ LlamaIndex does not ship a universal ``init_chat_model`` /
+ ``init_embedding_model`` equivalent (each vendor is a separate package
+ under ``llama-index-llms-*`` / ``llama-index-embeddings-*`` with its own
+ constructor kwargs). The hook therefore covers the OpenAI-compatible
+ surface that matches LlamaIndex's own ``resolve_embed_model("default")``
+ behaviour. For other vendors (Cohere, Bedrock, Vertex, HuggingFace, ...)
+ instantiate the LlamaIndex class directly in your ``@task`` and pass it
+ to the operator's ``embed_model=`` / ``llm=`` parameter -- both
+ ``LlamaIndexEmbeddingOperator`` and ``LlamaIndexRetrievalOperator`` accept a pre-built
+ ``BaseEmbedding`` / ``LLM`` instance and bypass the hook in that case.
+
+ .. note::
+
+ The hook deliberately does **not** mutate LlamaIndex's global
+ ``Settings`` singleton. Operators pass the resolved model directly
+ to LlamaIndex constructors so concurrent tasks in the same worker
+ don't race on shared state.
+
+ Connection fields:
+
+ * **password**: API key passed as ``api_key=``.
+ * **host**: Optional base URL passed as ``api_base=`` (custom endpoints,
+ Ollama, vLLM).
+ * **extra** JSON: ``{"embed_model": "text-embedding-3-small",
+ "llm_model": "gpt-4o"}`` -- default model identifiers stored on the
+ connection.
+
+ :param llm_conn_id: Airflow connection ID for the LLM provider. Falls
+ back to :attr:`default_conn_name` (``"llamaindex_default"``) when
+ not provided.
+ :param embed_conn_id: Optional separate Airflow connection ID for the
+ embedding provider. Falls back to ``llm_conn_id`` when not set.
+ :param embed_model: Embedding model name (e.g.
+ ``"text-embedding-3-small"``). Overrides ``extra["embed_model"]``
+ on the connection.
+ :param llm_model: LLM model name (e.g. ``"gpt-4o"``). Overrides
+ ``extra["llm_model"]`` on the connection. Required when calling
+ :meth:`get_llm`.
+ """
+
+ conn_name_attr = "llm_conn_id"
+ default_conn_name = "llamaindex_default"
+ conn_type = "llamaindex"
+ hook_name = "LlamaIndex"
+
+ def __init__(
+ self,
+ llm_conn_id: str | None = None,
+ embed_conn_id: str | None = None,
+ embed_model: str | None = None,
+ llm_model: str | None = None,
+ **kwargs: Any,
+ ) -> None:
+ super().__init__(**kwargs)
+ # Resolve at runtime so a future per-vendor subclass with its own
+ # ``default_conn_name`` is honoured.
+ self.llm_conn_id = llm_conn_id if llm_conn_id is not None else self.default_conn_name
+ self.embed_conn_id = embed_conn_id if embed_conn_id is not None else self.llm_conn_id
+ self.embed_model = embed_model
+ self.llm_model = llm_model
+
+ @staticmethod
+ def get_ui_field_behaviour() -> dict[str, Any]:
+ """Return custom field behaviour for the Airflow connection form."""
+ return {
+ "hidden_fields": ["schema", "port", "login"],
+ "relabeling": {"password": "API Key"},
+ "placeholders": {
+ "host": "https://api.openai.com/v1 (optional, for custom endpoints / Ollama)",
+ "extra": '{"embed_model": "text-embedding-3-small", "llm_model": "gpt-4o"}',
+ },
+ }
+
+ @staticmethod
+ def _resolve_model(
+ conn_extra: dict[str, Any],
+ *,
+ constructor_value: str | None,
+ extra_key: str,
+ kind: str,
+ ) -> str:
+ """Resolve a model identifier from the constructor arg or connection extra."""
+ model_id = constructor_value or conn_extra.get(extra_key)
+ if not model_id:
+ raise ValueError(
+ f"No {kind} model identifier set. Pass {extra_key}= to the hook "
+ f'constructor or set extra={{"{extra_key}": "model-name"}} on '
+ "the connection."
+ )
+ return model_id
+
+ @staticmethod
+ def _connection_kwargs(conn: Any) -> dict[str, Any]:
+ """Return shared OpenAI-style kwargs (api_key, api_base) from the connection."""
+ kwargs: dict[str, Any] = {}
+ if conn.password:
+ kwargs["api_key"] = conn.password
+ if conn.host:
+ kwargs["api_base"] = conn.host
+ return kwargs
+
+ def get_embedding_model(self) -> BaseEmbedding:
+ """
+ Return a LlamaIndex embedding model configured from the Airflow connection.
+
+ Uses ``embed_conn_id`` (falls back to ``llm_conn_id``) for credentials.
+ Returns an ``OpenAIEmbedding`` instance; for other vendors,
+ instantiate the LlamaIndex class directly and pass it to the
+ operator's ``embed_model=`` parameter.
+ """
+ # Lazy: llama-index is an optional extra; importing at module level
+ # would break common.ai for users who haven't installed ``[llamaindex]``.
+ try:
+ from llama_index.embeddings.openai import OpenAIEmbedding
+ except ImportError as e:
+ raise AirflowOptionalProviderFeatureException(e)
+
+ conn = self.get_connection(self.embed_conn_id)
+ model_id = self._resolve_model(
+ conn.extra_dejson,
+ constructor_value=self.embed_model,
+ extra_key="embed_model",
+ kind="embedding",
+ )
+ return OpenAIEmbedding(model=model_id, **self._connection_kwargs(conn))
+
+ def get_llm(self) -> LLM:
+ """
+ Return a LlamaIndex LLM configured from the Airflow connection.
+
+ Returns an ``OpenAI`` LLM instance; for other vendors, instantiate
+ the LlamaIndex class directly and pass it to the operator's ``llm=``
+ parameter.
+ """
+ try:
+ from llama_index.llms.openai import OpenAI
+ except ImportError as e:
+ raise AirflowOptionalProviderFeatureException(e)
+
+ conn = self.get_connection(self.llm_conn_id)
+ model_id = self._resolve_model(
+ conn.extra_dejson,
+ constructor_value=self.llm_model,
+ extra_key="llm_model",
+ kind="llm",
+ )
+ return OpenAI(model=model_id, **self._connection_kwargs(conn))
diff --git a/providers/common/ai/src/airflow/providers/common/ai/operators/llamaindex_embedding.py b/providers/common/ai/src/airflow/providers/common/ai/operators/llamaindex_embedding.py
new file mode 100644
index 0000000000000..d85e692100202
--- /dev/null
+++ b/providers/common/ai/src/airflow/providers/common/ai/operators/llamaindex_embedding.py
@@ -0,0 +1,210 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Operator for document chunking and embedding via LlamaIndex."""
+
+from __future__ import annotations
+
+import os
+from collections.abc import Sequence
+from typing import TYPE_CHECKING, Any, cast
+
+from airflow.providers.common.compat.sdk import (
+ AirflowOptionalProviderFeatureException,
+ BaseOperator,
+)
+
+if TYPE_CHECKING:
+ from llama_index.core.base.embeddings.base import BaseEmbedding
+ from llama_index.core.schema import TextNode
+
+ from airflow.sdk import Context
+
+
+class LlamaIndexEmbeddingOperator(BaseOperator):
+ """
+ Chunk documents and produce embedding vectors using LlamaIndex.
+
+ Bridges document loading (e.g.
+ :class:`~airflow.providers.common.ai.operators.document_loader.DocumentLoaderOperator`
+ output) and vector storage (pgvector, Pinecone, Weaviate, ...). Input is
+ ``list[dict]`` with ``text`` and ``metadata`` keys; output includes the
+ embedding vectors ready for downstream storage ingest.
+
+ The operator passes the embedding model **directly** to
+ ``VectorStoreIndex(..., embed_model=...)`` -- it does not mutate
+ LlamaIndex's global ``Settings`` singleton, so concurrent tasks in the
+ same worker don't race on shared state.
+
+ :param documents: List of dicts with ``text`` and ``metadata`` keys,
+ typically from ``DocumentLoaderOperator`` or a ``@task``. Templated,
+ so binding via ``my_loader.output`` (XCom direct) resolves to the
+ native ``list[dict]`` before ``execute`` runs.
+ :param embed_model: Either:
+
+ * a string model name (e.g. ``"text-embedding-3-small"``) -- the
+ operator constructs an :class:`~.LlamaIndexHook`-backed
+ ``OpenAIEmbedding`` from ``llm_conn_id`` / ``embed_conn_id``, or
+ * a pre-built ``BaseEmbedding`` instance -- bypass the hook
+ entirely for non-OpenAI vendors (e.g.
+ ``CohereEmbedding(...)``, ``BedrockEmbedding(...)``).
+
+ Templated, so it works with both literal strings and ``@task``
+ output that builds a custom embedder.
+
+ :param llm_conn_id: Airflow connection ID for the embedding API. Falls
+ back to :attr:`LlamaIndexHook.default_conn_name` when ``None``.
+ :param embed_conn_id: Optional separate Airflow connection ID for the
+ embedding provider. Falls back to ``llm_conn_id`` when ``None``.
+ :param chunk_size: Chunk size for the sentence splitter.
+ :param chunk_overlap: Overlap between chunks.
+ :param persist_dir: Optional path to persist the index. Accepts local
+ paths and storage URIs (``s3://``, ``gs://``, ...) resolved via
+ :class:`~airflow.sdk.ObjectStoragePath`.
+ :param persist_conn_id: Airflow connection ID for cloud-storage
+ credentials when ``persist_dir`` is a URI.
+ """
+
+ template_fields: Sequence[str] = (
+ "documents",
+ "embed_model",
+ "llm_conn_id",
+ "embed_conn_id",
+ "persist_dir",
+ "persist_conn_id",
+ )
+
+ def __init__(
+ self,
+ *,
+ documents: list[dict[str, Any]],
+ embed_model: str | BaseEmbedding | None = None,
+ llm_conn_id: str | None = None,
+ embed_conn_id: str | None = None,
+ chunk_size: int = 512,
+ chunk_overlap: int = 50,
+ persist_dir: str | None = None,
+ persist_conn_id: str | None = None,
+ **kwargs: Any,
+ ) -> None:
+ super().__init__(**kwargs)
+ self.documents = documents
+ self.embed_model = embed_model
+ self.llm_conn_id = llm_conn_id
+ self.embed_conn_id = embed_conn_id
+ self.chunk_size = chunk_size
+ self.chunk_overlap = chunk_overlap
+ self.persist_dir = persist_dir
+ self.persist_conn_id = persist_conn_id
+
+ def execute(self, context: Context) -> dict[str, Any]:
+ try:
+ from llama_index.core import Document, VectorStoreIndex
+ from llama_index.core.node_parser import SentenceSplitter
+ except ImportError as e:
+ raise AirflowOptionalProviderFeatureException(e)
+
+ embed_model = self._resolve_embed_model()
+
+ llama_docs = [Document(text=doc["text"], metadata=doc.get("metadata", {})) for doc in self.documents]
+
+ splitter = SentenceSplitter(chunk_size=self.chunk_size, chunk_overlap=self.chunk_overlap)
+ nodes = splitter.get_nodes_from_documents(llama_docs)
+ self.log.info("Split %d documents into %d chunks", len(llama_docs), len(nodes))
+
+ # ``VectorStoreIndex(...)`` populates each node's ``.embedding`` as a
+ # side effect of building the index; capture the index so the
+ # variable isn't discarded.
+ index = VectorStoreIndex(nodes, embed_model=embed_model, show_progress=False)
+
+ if self.persist_dir:
+ self._persist(index, self.persist_dir)
+
+ # ``SentenceSplitter`` always returns ``TextNode`` instances, but the
+ # base ``get_nodes_from_documents`` signature is typed as
+ # ``list[BaseNode]`` (which has no ``.text``). Cast so mypy doesn't
+ # flag the ``.text`` access; ``node.embedding`` is populated by
+ # ``VectorStoreIndex`` for every node above.
+ text_nodes = cast("list[TextNode]", nodes)
+ chunks = [
+ {
+ "text": node.text,
+ "metadata": node.metadata,
+ "vector": node.embedding,
+ }
+ for node in text_nodes
+ ]
+
+ return {
+ "document_count": len(llama_docs),
+ "chunk_count": len(nodes),
+ "persist_dir": self.persist_dir,
+ "chunks": chunks,
+ }
+
+ def _resolve_embed_model(self) -> BaseEmbedding:
+ """
+ Return a ready-to-use ``BaseEmbedding``.
+
+ Three cases:
+
+ * ``None`` or ``str`` -- build an ``OpenAIEmbedding`` via
+ ``LlamaIndexHook`` (the framework's documented ``default``
+ behaviour).
+ * Has ``get_text_embedding`` / ``_get_query_embedding`` -- treat as
+ a pre-built ``BaseEmbedding`` (duck-typed to avoid forcing a
+ ``llama_index`` import here).
+ * Anything else -- ``TypeError`` with a clear pointer.
+ """
+ if self.embed_model is None or isinstance(self.embed_model, str):
+ from airflow.providers.common.ai.hooks.llamaindex import LlamaIndexHook
+
+ return LlamaIndexHook(
+ llm_conn_id=self.llm_conn_id,
+ embed_conn_id=self.embed_conn_id,
+ embed_model=self.embed_model,
+ ).get_embedding_model()
+
+ # ``BaseEmbedding`` always exposes these two methods (see
+ # ``llama_index.core.base.embeddings.base``). Duck-typing avoids
+ # importing ``llama_index`` here and also catches the case where an
+ # unresolved ``XComArg`` slips through.
+ if hasattr(self.embed_model, "get_text_embedding") and hasattr(
+ self.embed_model, "_get_query_embedding"
+ ):
+ return self.embed_model
+
+ raise TypeError(
+ "embed_model must be a string model name, a LlamaIndex "
+ f"``BaseEmbedding`` instance, or None. Got {type(self.embed_model).__name__!r}."
+ )
+
+ def _persist(self, index: Any, persist_dir: str) -> None:
+ """Persist the index to ``persist_dir``; cloud URIs go through ObjectStoragePath."""
+ if "://" in persist_dir:
+ from airflow.sdk import ObjectStoragePath
+
+ target = ObjectStoragePath(persist_dir, conn_id=self.persist_conn_id)
+ target.mkdir(parents=True, exist_ok=True)
+ # ``str(target)`` returns ``s3://@/...`` when
+ # ``conn_id`` is set (see ``task-sdk/.../io/path.py``), which
+ # fsspec misinterprets. Pass the raw user URI as the path string
+ # and the authenticated filesystem separately.
+ index.storage_context.persist(persist_dir=persist_dir, fs=target.fs)
+ else:
+ os.makedirs(persist_dir, exist_ok=True)
+ index.storage_context.persist(persist_dir=persist_dir)
+ self.log.info("Index persisted to %s", persist_dir)
diff --git a/providers/common/ai/src/airflow/providers/common/ai/operators/llamaindex_retrieval.py b/providers/common/ai/src/airflow/providers/common/ai/operators/llamaindex_retrieval.py
new file mode 100644
index 0000000000000..9725cdace48d1
--- /dev/null
+++ b/providers/common/ai/src/airflow/providers/common/ai/operators/llamaindex_retrieval.py
@@ -0,0 +1,199 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Operator for semantic retrieval via a persisted LlamaIndex index."""
+
+from __future__ import annotations
+
+from collections.abc import Sequence
+from pathlib import Path
+from typing import TYPE_CHECKING, Any
+
+from airflow.providers.common.compat.sdk import (
+ AirflowOptionalProviderFeatureException,
+ BaseOperator,
+)
+
+if TYPE_CHECKING:
+ from llama_index.core.base.embeddings.base import BaseEmbedding
+
+ from airflow.sdk import Context
+
+
+class LlamaIndexRetrievalOperator(BaseOperator):
+ """
+ Retrieve relevant document chunks from a persisted LlamaIndex index.
+
+ Loads a previously persisted vector store index (from
+ ``LlamaIndexEmbeddingOperator(persist_dir=...)``) and performs similarity search
+ against the provided query. Output is a list of chunks with text,
+ score, metadata, and node id, ready for downstream synthesis via
+ :class:`~airflow.providers.common.ai.operators.llm.LLMOperator`.
+
+ Passes the embedding model **directly** to
+ ``load_index_from_storage(..., embed_model=...)`` -- no LlamaIndex
+ ``Settings`` mutation, so concurrent tasks in the same worker don't
+ race on shared state.
+
+ :param query: The query string. Supports Jinja templating.
+ :param index_persist_dir: Local path or storage URI (``s3://``,
+ ``gs://``, ...) pointing at the persisted LlamaIndex index.
+ Resolved via :class:`~airflow.sdk.ObjectStoragePath` when a URI
+ scheme is present.
+ :param persist_conn_id: Airflow connection ID for cloud-storage
+ credentials when ``index_persist_dir`` is a URI.
+ :param embed_model: Either:
+
+ * a string model name (e.g. ``"text-embedding-3-small"``) -- the
+ operator constructs an :class:`~.LlamaIndexHook`-backed
+ ``OpenAIEmbedding`` from ``llm_conn_id`` / ``embed_conn_id``, or
+ * a pre-built ``BaseEmbedding`` instance -- bypass the hook for
+ non-OpenAI vendors. Must match the embedding model used when
+ the index was originally built.
+
+ Templated, so it works with both literal strings and ``@task``
+ output that builds a custom embedder.
+
+ :param llm_conn_id: Airflow connection ID for the embedding API. Falls
+ back to :attr:`LlamaIndexHook.default_conn_name` when ``None``.
+ Used only when ``embed_model`` is a string (or omitted entirely).
+ :param embed_conn_id: Optional separate Airflow connection ID for the
+ embedding provider. Falls back to ``llm_conn_id`` when ``None``.
+ :param top_k: Number of top results to retrieve.
+ """
+
+ template_fields: Sequence[str] = (
+ "query",
+ "index_persist_dir",
+ "persist_conn_id",
+ "embed_model",
+ "llm_conn_id",
+ "embed_conn_id",
+ )
+
+ def __init__(
+ self,
+ *,
+ query: str,
+ index_persist_dir: str,
+ persist_conn_id: str | None = None,
+ embed_model: str | BaseEmbedding | None = None,
+ llm_conn_id: str | None = None,
+ embed_conn_id: str | None = None,
+ top_k: int = 5,
+ **kwargs: Any,
+ ) -> None:
+ super().__init__(**kwargs)
+ self.query = query
+ self.index_persist_dir = index_persist_dir
+ self.persist_conn_id = persist_conn_id
+ self.embed_model = embed_model
+ self.llm_conn_id = llm_conn_id
+ self.embed_conn_id = embed_conn_id
+ self.top_k = top_k
+
+ def execute(self, context: Context) -> dict[str, Any]:
+ try:
+ from llama_index.core import StorageContext, load_index_from_storage
+ except ImportError as e:
+ raise AirflowOptionalProviderFeatureException(e)
+
+ embed_model = self._resolve_embed_model()
+ storage_context = self._open_storage_context(StorageContext)
+ index = load_index_from_storage(storage_context, embed_model=embed_model)
+
+ retriever = index.as_retriever(similarity_top_k=self.top_k)
+ results = retriever.retrieve(self.query)
+ self.log.info("Retrieved %d chunks for query: %s", len(results), self.query[:100])
+
+ chunks = [
+ {
+ "text": node_with_score.node.get_content(),
+ "score": node_with_score.score,
+ "metadata": node_with_score.node.metadata,
+ "node_id": node_with_score.node.node_id,
+ }
+ for node_with_score in results
+ ]
+
+ return {
+ "query": self.query,
+ "chunks": chunks,
+ }
+
+ def _resolve_embed_model(self) -> BaseEmbedding:
+ """
+ Return a ready-to-use ``BaseEmbedding``.
+
+ Three cases:
+
+ * ``None`` or ``str`` -- build an ``OpenAIEmbedding`` via
+ ``LlamaIndexHook`` (the framework's documented ``default``
+ behaviour).
+ * Has ``get_text_embedding`` / ``_get_query_embedding`` -- treat as
+ a pre-built ``BaseEmbedding`` (duck-typed to avoid forcing a
+ ``llama_index`` import here).
+ * Anything else -- ``TypeError`` with a clear pointer.
+ """
+ if self.embed_model is None or isinstance(self.embed_model, str):
+ from airflow.providers.common.ai.hooks.llamaindex import LlamaIndexHook
+
+ return LlamaIndexHook(
+ llm_conn_id=self.llm_conn_id,
+ embed_conn_id=self.embed_conn_id,
+ embed_model=self.embed_model,
+ ).get_embedding_model()
+
+ # ``BaseEmbedding`` always exposes these two methods (see
+ # ``llama_index.core.base.embeddings.base``). Duck-typing avoids
+ # importing ``llama_index`` here and also catches the case where an
+ # unresolved ``XComArg`` slips through.
+ if hasattr(self.embed_model, "get_text_embedding") and hasattr(
+ self.embed_model, "_get_query_embedding"
+ ):
+ return self.embed_model
+
+ raise TypeError(
+ "embed_model must be a string model name, a LlamaIndex "
+ f"``BaseEmbedding`` instance, or None. Got {type(self.embed_model).__name__!r}."
+ )
+
+ def _open_storage_context(self, storage_context_cls: Any) -> Any:
+ """Open a ``StorageContext`` from a local path or storage URI."""
+ if "://" in self.index_persist_dir:
+ from airflow.sdk import ObjectStoragePath
+
+ source = ObjectStoragePath(self.index_persist_dir, conn_id=self.persist_conn_id)
+ if not source.is_dir():
+ raise FileNotFoundError(
+ f"Persisted LlamaIndex index not found at '{self.index_persist_dir}'. "
+ "Did you run LlamaIndexEmbeddingOperator with the same persist_dir first?"
+ )
+ # ``str(source)`` returns ``s3://@/...`` when
+ # ``conn_id`` is set (see ``task-sdk/.../io/path.py``), which
+ # fsspec misinterprets. Pass the raw user URI as the path string
+ # and the authenticated filesystem separately.
+ return storage_context_cls.from_defaults(
+ persist_dir=self.index_persist_dir,
+ fs=source.fs,
+ )
+
+ if not Path(self.index_persist_dir).is_dir():
+ raise FileNotFoundError(
+ f"Persisted LlamaIndex index not found at '{self.index_persist_dir}'. "
+ "Did you run LlamaIndexEmbeddingOperator with the same persist_dir first?"
+ )
+ return storage_context_cls.from_defaults(persist_dir=self.index_persist_dir)
diff --git a/providers/common/ai/tests/unit/common/ai/hooks/test_llamaindex.py b/providers/common/ai/tests/unit/common/ai/hooks/test_llamaindex.py
new file mode 100644
index 0000000000000..9d6e71790b3e0
--- /dev/null
+++ b/providers/common/ai/tests/unit/common/ai/hooks/test_llamaindex.py
@@ -0,0 +1,170 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from airflow.providers.common.ai.hooks.llamaindex import LlamaIndexHook
+
+
+def _conn(password: str = "", host: str = "", extra: dict | None = None) -> MagicMock:
+ mock_conn = MagicMock()
+ mock_conn.password = password
+ mock_conn.host = host
+ mock_conn.extra_dejson = extra or {}
+ return mock_conn
+
+
+class TestLlamaIndexHookInit:
+ def test_default_params(self):
+ hook = LlamaIndexHook()
+ assert hook.llm_conn_id == "llamaindex_default"
+ assert hook.embed_conn_id == "llamaindex_default"
+ assert hook.embed_model is None
+ assert hook.llm_model is None
+
+ def test_embed_conn_falls_back_to_llm_conn(self):
+ hook = LlamaIndexHook(llm_conn_id="my_conn")
+ assert hook.embed_conn_id == "my_conn"
+
+ def test_explicit_separate_conns_and_models(self):
+ hook = LlamaIndexHook(
+ llm_conn_id="chat_conn",
+ embed_conn_id="embed_conn",
+ embed_model="text-embedding-3-large",
+ llm_model="gpt-4o",
+ )
+ assert hook.llm_conn_id == "chat_conn"
+ assert hook.embed_conn_id == "embed_conn"
+ assert hook.embed_model == "text-embedding-3-large"
+ assert hook.llm_model == "gpt-4o"
+
+ def test_conn_type_is_llamaindex(self):
+ assert LlamaIndexHook.conn_type == "llamaindex"
+ assert LlamaIndexHook.default_conn_name == "llamaindex_default"
+ assert LlamaIndexHook.conn_name_attr == "llm_conn_id"
+ assert LlamaIndexHook.hook_name == "LlamaIndex"
+
+
+class TestGetUiFieldBehaviour:
+ def test_shape(self):
+ behaviour = LlamaIndexHook.get_ui_field_behaviour()
+ assert behaviour["hidden_fields"] == ["schema", "port", "login"]
+ assert behaviour["relabeling"] == {"password": "API Key"}
+ assert "host" in behaviour["placeholders"]
+ assert "embed_model" in behaviour["placeholders"]["extra"]
+ assert "llm_model" in behaviour["placeholders"]["extra"]
+
+
+class TestResolveModel:
+ def test_constructor_wins_over_extra(self):
+ result = LlamaIndexHook._resolve_model(
+ {"embed_model": "old"},
+ constructor_value="new",
+ extra_key="embed_model",
+ kind="embedding",
+ )
+ assert result == "new"
+
+ def test_falls_back_to_extra(self):
+ result = LlamaIndexHook._resolve_model(
+ {"embed_model": "from-extra"},
+ constructor_value=None,
+ extra_key="embed_model",
+ kind="embedding",
+ )
+ assert result == "from-extra"
+
+ def test_raises_when_neither_set(self):
+ with pytest.raises(ValueError, match="No embedding model identifier set"):
+ LlamaIndexHook._resolve_model(
+ {},
+ constructor_value=None,
+ extra_key="embed_model",
+ kind="embedding",
+ )
+
+
+class TestGetEmbeddingModel:
+ @patch("llama_index.embeddings.openai.OpenAIEmbedding")
+ @patch.object(LlamaIndexHook, "get_connection")
+ def test_dispatches_with_api_key(self, mock_get_conn, mock_cls):
+ mock_get_conn.return_value = _conn(password="sk-test")
+ hook = LlamaIndexHook(embed_model="text-embedding-3-small")
+
+ result = hook.get_embedding_model()
+
+ mock_get_conn.assert_called_once_with("llamaindex_default")
+ mock_cls.assert_called_once_with(model="text-embedding-3-small", api_key="sk-test")
+ assert result is mock_cls.return_value
+
+ @patch("llama_index.embeddings.openai.OpenAIEmbedding")
+ @patch.object(LlamaIndexHook, "get_connection")
+ def test_dispatches_with_api_base(self, mock_get_conn, mock_cls):
+ mock_get_conn.return_value = _conn(password="sk-test", host="http://localhost:11434/v1")
+ hook = LlamaIndexHook(embed_model="text-embedding-3-small")
+
+ hook.get_embedding_model()
+
+ mock_cls.assert_called_once_with(
+ model="text-embedding-3-small",
+ api_key="sk-test",
+ api_base="http://localhost:11434/v1",
+ )
+
+ @patch("llama_index.embeddings.openai.OpenAIEmbedding")
+ @patch.object(LlamaIndexHook, "get_connection")
+ def test_resolves_model_from_extra(self, mock_get_conn, mock_cls):
+ mock_get_conn.return_value = _conn(
+ password="sk-test", extra={"embed_model": "text-embedding-3-large"}
+ )
+ hook = LlamaIndexHook()
+
+ hook.get_embedding_model()
+
+ mock_cls.assert_called_once_with(model="text-embedding-3-large", api_key="sk-test")
+
+ @patch.object(LlamaIndexHook, "get_connection")
+ def test_raises_when_no_model_anywhere(self, mock_get_conn):
+ mock_get_conn.return_value = _conn(password="sk-test")
+ hook = LlamaIndexHook()
+
+ with pytest.raises(ValueError, match="No embedding model identifier set"):
+ hook.get_embedding_model()
+
+
+class TestGetLlm:
+ @patch("llama_index.llms.openai.OpenAI")
+ @patch.object(LlamaIndexHook, "get_connection")
+ def test_dispatches_with_api_key(self, mock_get_conn, mock_cls):
+ mock_get_conn.return_value = _conn(password="sk-test")
+ hook = LlamaIndexHook(llm_model="gpt-4o")
+
+ result = hook.get_llm()
+
+ mock_cls.assert_called_once_with(model="gpt-4o", api_key="sk-test")
+ assert result is mock_cls.return_value
+
+ @patch.object(LlamaIndexHook, "get_connection")
+ def test_raises_when_no_llm_model(self, mock_get_conn):
+ mock_get_conn.return_value = _conn(password="sk-test")
+ hook = LlamaIndexHook()
+
+ with pytest.raises(ValueError, match="No llm model identifier set"):
+ hook.get_llm()
diff --git a/providers/common/ai/tests/unit/common/ai/operators/test_llamaindex_embedding.py b/providers/common/ai/tests/unit/common/ai/operators/test_llamaindex_embedding.py
new file mode 100644
index 0000000000000..43b44f87c9ff4
--- /dev/null
+++ b/providers/common/ai/tests/unit/common/ai/operators/test_llamaindex_embedding.py
@@ -0,0 +1,211 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from airflow.providers.common.ai.operators.llamaindex_embedding import LlamaIndexEmbeddingOperator
+
+
+@pytest.fixture
+def _li(monkeypatch):
+ """Patch the two LlamaIndex constructors the operator uses inside execute().
+
+ ``llama_index`` (core + openai embeddings) is a real test dependency
+ declared in ``providers/common/ai/pyproject.toml``'s dev group, so
+ ``@patch("llama_index.core.X")`` resolves against the real module.
+ """
+ VectorStoreIndex = MagicMock(name="VectorStoreIndex")
+ SentenceSplitter = MagicMock(name="SentenceSplitter")
+ monkeypatch.setattr("llama_index.core.VectorStoreIndex", VectorStoreIndex)
+ monkeypatch.setattr("llama_index.core.node_parser.SentenceSplitter", SentenceSplitter)
+ return {"VectorStoreIndex": VectorStoreIndex, "SentenceSplitter": SentenceSplitter}
+
+
+def _node(text: str = "chunk text", metadata: dict | None = None, vector=None):
+ node = MagicMock()
+ node.text = text
+ node.metadata = metadata or {}
+ node.embedding = vector
+ return node
+
+
+def _byo_embedding():
+ """Return a duck-typed ``BaseEmbedding`` stand-in (has the two methods the operator checks)."""
+ return MagicMock(name="MyBaseEmbedding", spec=["get_text_embedding", "_get_query_embedding"])
+
+
+class TestEmbeddingOperatorInit:
+ def test_template_fields(self):
+ # ``documents`` must be templated so ``loader.output`` (XComArg) is
+ # resolved before execute. The earlier rationale that "list[dict]
+ # doesn't survive Jinja stringification" was wrong -- Templater
+ # unwraps resolvables before Jinja runs.
+ assert set(LlamaIndexEmbeddingOperator.template_fields) == {
+ "documents",
+ "embed_model",
+ "llm_conn_id",
+ "embed_conn_id",
+ "persist_dir",
+ "persist_conn_id",
+ }
+
+
+class TestEmbeddingOperatorExecute:
+ @patch("airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook.get_embedding_model")
+ def test_string_embed_model_goes_through_hook(self, mock_get_embed, _li):
+ # `embed_model` as a string -> hook builds OpenAIEmbedding.
+ _li["SentenceSplitter"].return_value.get_nodes_from_documents.return_value = [
+ _node(text="chunk a", vector=[0.1, 0.2]),
+ ]
+
+ op = LlamaIndexEmbeddingOperator(
+ task_id="test",
+ documents=[{"text": "doc", "metadata": {"src": "x"}}],
+ embed_model="text-embedding-3-small",
+ llm_conn_id="my_conn",
+ )
+ result = op.execute(context=MagicMock())
+
+ mock_get_embed.assert_called_once()
+ assert result["document_count"] == 1
+ assert result["chunk_count"] == 1
+ assert result["chunks"][0]["text"] == "chunk a"
+ assert result["chunks"][0]["vector"] == [0.1, 0.2]
+
+ @patch("airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook")
+ def test_string_embed_model_forwards_embed_conn_id(self, mock_hook_cls, _li):
+ # ``embed_conn_id`` overrides ``llm_conn_id`` for the embedding API.
+ _li["SentenceSplitter"].return_value.get_nodes_from_documents.return_value = [_node()]
+
+ op = LlamaIndexEmbeddingOperator(
+ task_id="test",
+ documents=[{"text": "doc"}],
+ embed_model="text-embedding-3-small",
+ llm_conn_id="my_llm_conn",
+ embed_conn_id="my_embed_conn",
+ )
+ op.execute(context=MagicMock())
+
+ mock_hook_cls.assert_called_once_with(
+ llm_conn_id="my_llm_conn",
+ embed_conn_id="my_embed_conn",
+ embed_model="text-embedding-3-small",
+ )
+
+ def test_byo_embed_model_bypasses_hook(self, _li):
+ # `embed_model` is a non-string instance -> hook is bypassed.
+ byo = _byo_embedding()
+ _li["SentenceSplitter"].return_value.get_nodes_from_documents.return_value = [_node()]
+
+ op = LlamaIndexEmbeddingOperator(
+ task_id="test",
+ documents=[{"text": "doc"}],
+ embed_model=byo,
+ )
+ op.execute(context=MagicMock())
+
+ # VectorStoreIndex called with the user's instance, not anything else.
+ _li["VectorStoreIndex"].assert_called_once()
+ kwargs = _li["VectorStoreIndex"].call_args.kwargs
+ assert kwargs["embed_model"] is byo
+
+ def test_invalid_embed_model_raises_typeerror(self, _li):
+ # An object that's neither None/str nor duck-types as BaseEmbedding
+ # (e.g. an unresolved XComArg or random user input) raises TypeError
+ # with a clear pointer rather than a cryptic downstream error.
+ _li["SentenceSplitter"].return_value.get_nodes_from_documents.return_value = [_node()]
+
+ op = LlamaIndexEmbeddingOperator(
+ task_id="test",
+ documents=[{"text": "doc"}],
+ embed_model=12345, # type: ignore[arg-type]
+ )
+ with pytest.raises(TypeError, match="embed_model must be"):
+ op.execute(context=MagicMock())
+
+ @patch("airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook.get_embedding_model")
+ def test_chunks_carry_text_metadata_vector(self, mock_get_embed, _li):
+ _li["SentenceSplitter"].return_value.get_nodes_from_documents.return_value = [
+ _node(text="x", metadata={"k": "v"}, vector=[1.0, 2.0]),
+ _node(text="y", metadata={"k": "v2"}, vector=[3.0, 4.0]),
+ ]
+
+ op = LlamaIndexEmbeddingOperator(
+ task_id="test",
+ documents=[{"text": "doc"}],
+ embed_model="text-embedding-3-small",
+ )
+ result = op.execute(context=MagicMock())
+
+ assert result["chunks"] == [
+ {"text": "x", "metadata": {"k": "v"}, "vector": [1.0, 2.0]},
+ {"text": "y", "metadata": {"k": "v2"}, "vector": [3.0, 4.0]},
+ ]
+
+
+class TestEmbeddingOperatorPersist:
+ @patch("os.makedirs")
+ @patch("airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook.get_embedding_model")
+ def test_local_persist_dir_calls_makedirs_and_storage_persist(
+ self, mock_get_embed, mock_makedirs, _li, tmp_path
+ ):
+ node = _node()
+ _li["SentenceSplitter"].return_value.get_nodes_from_documents.return_value = [node]
+ index = _li["VectorStoreIndex"].return_value
+
+ op = LlamaIndexEmbeddingOperator(
+ task_id="test",
+ documents=[{"text": "doc"}],
+ embed_model="text-embedding-3-small",
+ persist_dir=str(tmp_path / "idx"),
+ )
+ op.execute(context=MagicMock())
+
+ mock_makedirs.assert_called_once_with(str(tmp_path / "idx"), exist_ok=True)
+ index.storage_context.persist.assert_called_once_with(persist_dir=str(tmp_path / "idx"))
+
+ @patch("airflow.sdk.ObjectStoragePath")
+ @patch("airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook.get_embedding_model")
+ def test_cloud_uri_persist_dir_uses_object_storage_path(self, mock_get_embed, mock_osp_cls, _li):
+ # ``ObjectStoragePath.__str__`` returns ``://@/...``
+ # when ``conn_id`` is set, which fsspec misinterprets. The operator must
+ # pass the **raw** user URI to ``persist_dir=`` and supply
+ # ``fs=target.fs`` for credentials. Asserting against the raw URI here
+ # catches a regression where ``str(target)`` is used instead.
+ target = MagicMock()
+ target.fs = MagicMock(name="s3fs")
+ mock_osp_cls.return_value = target
+
+ node = _node()
+ _li["SentenceSplitter"].return_value.get_nodes_from_documents.return_value = [node]
+ index = _li["VectorStoreIndex"].return_value
+
+ op = LlamaIndexEmbeddingOperator(
+ task_id="test",
+ documents=[{"text": "doc"}],
+ embed_model="text-embedding-3-small",
+ persist_dir="s3://bucket/idx/",
+ persist_conn_id="aws_default",
+ )
+ op.execute(context=MagicMock())
+
+ mock_osp_cls.assert_called_once_with("s3://bucket/idx/", conn_id="aws_default")
+ target.mkdir.assert_called_once_with(parents=True, exist_ok=True)
+ index.storage_context.persist.assert_called_once_with(persist_dir="s3://bucket/idx/", fs=target.fs)
diff --git a/providers/common/ai/tests/unit/common/ai/operators/test_llamaindex_retrieval.py b/providers/common/ai/tests/unit/common/ai/operators/test_llamaindex_retrieval.py
new file mode 100644
index 0000000000000..58e2bb751c734
--- /dev/null
+++ b/providers/common/ai/tests/unit/common/ai/operators/test_llamaindex_retrieval.py
@@ -0,0 +1,238 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from airflow.providers.common.ai.operators.llamaindex_retrieval import LlamaIndexRetrievalOperator
+
+
+@pytest.fixture
+def _li(monkeypatch):
+ """Patch the two LlamaIndex symbols the retrieval operator uses inside execute().
+
+ ``llama_index`` (core + openai embeddings) is a real test dependency
+ declared in ``providers/common/ai/pyproject.toml``'s dev group, so
+ ``monkeypatch.setattr("llama_index.core.X", ...)`` resolves against the
+ real module.
+ """
+ StorageContext = MagicMock(name="StorageContext")
+ load_index_from_storage = MagicMock(name="load_index_from_storage")
+ monkeypatch.setattr("llama_index.core.StorageContext", StorageContext)
+ monkeypatch.setattr("llama_index.core.load_index_from_storage", load_index_from_storage)
+ return {
+ "StorageContext": StorageContext,
+ "load_index_from_storage": load_index_from_storage,
+ }
+
+
+def _scored_node(text: str, score: float, metadata: dict | None = None, node_id: str = "n"):
+ node = MagicMock()
+ node.get_content.return_value = text
+ node.metadata = metadata or {}
+ node.node_id = node_id
+ wrapped = MagicMock()
+ wrapped.node = node
+ wrapped.score = score
+ return wrapped
+
+
+def _byo_embedding():
+ """Return a duck-typed ``BaseEmbedding`` stand-in."""
+ return MagicMock(name="MyBaseEmbedding", spec=["get_text_embedding", "_get_query_embedding"])
+
+
+class TestRetrievalOperatorInit:
+ def test_template_fields(self):
+ assert set(LlamaIndexRetrievalOperator.template_fields) == {
+ "query",
+ "index_persist_dir",
+ "persist_conn_id",
+ "embed_model",
+ "llm_conn_id",
+ "embed_conn_id",
+ }
+
+
+class TestRetrievalOperatorOutput:
+ @patch("airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook.get_embedding_model")
+ def test_chunk_shape(self, mock_get_embed, _li, tmp_path):
+ # Make the persist_dir existence check pass.
+ (tmp_path / "idx").mkdir()
+
+ index = _li["load_index_from_storage"].return_value
+ retriever = index.as_retriever.return_value
+ retriever.retrieve.return_value = [
+ _scored_node("chunk a", 0.91, {"src": "x"}, "node-a"),
+ _scored_node("chunk b", 0.85, {"src": "y"}, "node-b"),
+ ]
+
+ op = LlamaIndexRetrievalOperator(
+ task_id="test",
+ query="what is airflow",
+ index_persist_dir=str(tmp_path / "idx"),
+ embed_model="text-embedding-3-small",
+ )
+ result = op.execute(context=MagicMock())
+
+ assert result == {
+ "query": "what is airflow",
+ "chunks": [
+ {"text": "chunk a", "score": 0.91, "metadata": {"src": "x"}, "node_id": "node-a"},
+ {"text": "chunk b", "score": 0.85, "metadata": {"src": "y"}, "node_id": "node-b"},
+ ],
+ }
+ # The retrieval-time embedding model is passed directly (no Settings mutation).
+ _li["load_index_from_storage"].assert_called_once()
+ kwargs = _li["load_index_from_storage"].call_args.kwargs
+ assert "embed_model" in kwargs
+ index.as_retriever.assert_called_once_with(similarity_top_k=5)
+
+ @patch("airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook.get_embedding_model")
+ def test_top_k_forwarded(self, mock_get_embed, _li, tmp_path):
+ (tmp_path / "idx").mkdir()
+ index = _li["load_index_from_storage"].return_value
+ index.as_retriever.return_value.retrieve.return_value = []
+
+ op = LlamaIndexRetrievalOperator(
+ task_id="test",
+ query="q",
+ index_persist_dir=str(tmp_path / "idx"),
+ embed_model="text-embedding-3-small",
+ top_k=12,
+ )
+ op.execute(context=MagicMock())
+
+ index.as_retriever.assert_called_once_with(similarity_top_k=12)
+
+ @patch("airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook")
+ def test_string_embed_model_forwards_embed_conn_id(self, mock_hook_cls, _li, tmp_path):
+ # ``embed_conn_id`` overrides ``llm_conn_id`` for the embedding API.
+ (tmp_path / "idx").mkdir()
+ index = _li["load_index_from_storage"].return_value
+ index.as_retriever.return_value.retrieve.return_value = []
+
+ op = LlamaIndexRetrievalOperator(
+ task_id="test",
+ query="q",
+ index_persist_dir=str(tmp_path / "idx"),
+ embed_model="text-embedding-3-small",
+ llm_conn_id="my_llm_conn",
+ embed_conn_id="my_embed_conn",
+ )
+ op.execute(context=MagicMock())
+
+ mock_hook_cls.assert_called_once_with(
+ llm_conn_id="my_llm_conn",
+ embed_conn_id="my_embed_conn",
+ embed_model="text-embedding-3-small",
+ )
+
+ def test_byo_embed_model_bypasses_hook(self, _li, tmp_path):
+ (tmp_path / "idx").mkdir()
+ byo = _byo_embedding()
+ index = _li["load_index_from_storage"].return_value
+ index.as_retriever.return_value.retrieve.return_value = []
+
+ op = LlamaIndexRetrievalOperator(
+ task_id="test",
+ query="q",
+ index_persist_dir=str(tmp_path / "idx"),
+ embed_model=byo,
+ )
+ op.execute(context=MagicMock())
+
+ kwargs = _li["load_index_from_storage"].call_args.kwargs
+ assert kwargs["embed_model"] is byo
+
+ def test_invalid_embed_model_raises_typeerror(self, _li, tmp_path):
+ # An object that's neither None/str nor duck-types as BaseEmbedding
+ # raises TypeError with a clear pointer.
+ (tmp_path / "idx").mkdir()
+
+ op = LlamaIndexRetrievalOperator(
+ task_id="test",
+ query="q",
+ index_persist_dir=str(tmp_path / "idx"),
+ embed_model=12345, # type: ignore[arg-type]
+ )
+ with pytest.raises(TypeError, match="embed_model must be"):
+ op.execute(context=MagicMock())
+
+
+class TestRetrievalOperatorMissingIndex:
+ @patch("airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook.get_embedding_model")
+ def test_local_missing_dir_raises_with_hint(self, mock_get_embed, _li, tmp_path):
+ op = LlamaIndexRetrievalOperator(
+ task_id="test",
+ query="q",
+ index_persist_dir=str(tmp_path / "no_such_dir"),
+ embed_model="text-embedding-3-small",
+ )
+ with pytest.raises(FileNotFoundError, match="LlamaIndexEmbeddingOperator"):
+ op.execute(context=MagicMock())
+
+ @patch("airflow.sdk.ObjectStoragePath")
+ @patch("airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook.get_embedding_model")
+ def test_cloud_missing_uri_raises_with_hint(self, mock_get_embed, mock_osp_cls, _li):
+ missing = MagicMock()
+ missing.is_dir.return_value = False
+ mock_osp_cls.return_value = missing
+
+ op = LlamaIndexRetrievalOperator(
+ task_id="test",
+ query="q",
+ index_persist_dir="s3://bucket/missing/",
+ embed_model="text-embedding-3-small",
+ )
+ with pytest.raises(FileNotFoundError, match="LlamaIndexEmbeddingOperator"):
+ op.execute(context=MagicMock())
+
+
+class TestRetrievalOperatorCloudURI:
+ @patch("airflow.sdk.ObjectStoragePath")
+ @patch("airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook.get_embedding_model")
+ def test_cloud_uri_opens_storage_with_fs(self, mock_get_embed, mock_osp_cls, _li):
+ # ``ObjectStoragePath.__str__`` returns ``://@/...``
+ # when ``conn_id`` is set, which fsspec misinterprets. The operator must
+ # pass the **raw** user URI to ``persist_dir=`` and supply
+ # ``fs=target.fs`` for credentials. Asserting against the raw URI here
+ # catches a regression where ``str(target)`` is used instead.
+ target = MagicMock()
+ target.is_dir.return_value = True
+ target.fs = MagicMock(name="s3fs")
+ mock_osp_cls.return_value = target
+
+ index = _li["load_index_from_storage"].return_value
+ index.as_retriever.return_value.retrieve.return_value = []
+
+ op = LlamaIndexRetrievalOperator(
+ task_id="test",
+ query="q",
+ index_persist_dir="s3://bucket/idx/",
+ persist_conn_id="aws_default",
+ embed_model="text-embedding-3-small",
+ )
+ op.execute(context=MagicMock())
+
+ mock_osp_cls.assert_called_once_with("s3://bucket/idx/", conn_id="aws_default")
+ _li["StorageContext"].from_defaults.assert_called_once_with(
+ persist_dir="s3://bucket/idx/",
+ fs=target.fs,
+ )
diff --git a/uv.lock b/uv.lock
index ab9ae7beb1078..7766a2d3cf466 100644
--- a/uv.lock
+++ b/uv.lock
@@ -4258,6 +4258,11 @@ google = [
langchain = [
{ name = "langchain" },
]
+llamaindex = [
+ { name = "llama-index-core" },
+ { name = "llama-index-embeddings-openai" },
+ { name = "llama-index-llms-openai" },
+]
mcp = [
{ name = "pydantic-ai-slim", extra = ["mcp"] },
]
@@ -4284,6 +4289,9 @@ dev = [
{ name = "apache-airflow-providers-standard" },
{ name = "apache-airflow-task-sdk" },
{ name = "langchain" },
+ { name = "llama-index-core" },
+ { name = "llama-index-embeddings-openai" },
+ { name = "llama-index-llms-openai" },
{ name = "pydantic-ai-slim", extra = ["mcp"] },
{ name = "sqlglot" },
]
@@ -4301,6 +4309,9 @@ requires-dist = [
{ name = "fastavro", marker = "python_full_version >= '3.14' and extra == 'avro'", specifier = ">=1.12.1" },
{ name = "fastavro", marker = "python_full_version < '3.14' and extra == 'avro'", specifier = ">=1.10.0" },
{ name = "langchain", marker = "extra == 'langchain'", specifier = ">=1.0.0" },
+ { name = "llama-index-core", marker = "extra == 'llamaindex'", specifier = ">=0.13.0" },
+ { name = "llama-index-embeddings-openai", marker = "extra == 'llamaindex'", specifier = ">=0.6.0" },
+ { name = "llama-index-llms-openai", marker = "extra == 'llamaindex'", specifier = ">=0.6.0" },
{ name = "pyarrow", marker = "python_full_version >= '3.14' and extra == 'parquet'", specifier = ">=22.0.0" },
{ name = "pyarrow", marker = "python_full_version < '3.14' and extra == 'parquet'", specifier = ">=18.0.0" },
{ name = "pydantic-ai-slim", specifier = ">=1.34.0" },
@@ -4313,7 +4324,7 @@ requires-dist = [
{ name = "python-docx", marker = "extra == 'docx'", specifier = ">=1.0.0" },
{ name = "sqlglot", marker = "extra == 'sql'", specifier = ">=30.0.0" },
]
-provides-extras = ["anthropic", "bedrock", "google", "openai", "mcp", "avro", "parquet", "sql", "common-sql", "langchain", "pdf", "docx"]
+provides-extras = ["anthropic", "bedrock", "google", "openai", "mcp", "avro", "parquet", "sql", "common-sql", "langchain", "llamaindex", "pdf", "docx"]
[package.metadata.requires-dev]
dev = [
@@ -4325,6 +4336,9 @@ dev = [
{ name = "apache-airflow-providers-standard", editable = "providers/standard" },
{ name = "apache-airflow-task-sdk", editable = "task-sdk" },
{ name = "langchain", specifier = ">=1.0.0" },
+ { name = "llama-index-core", specifier = ">=0.13.0" },
+ { name = "llama-index-embeddings-openai", specifier = ">=0.6.0" },
+ { name = "llama-index-llms-openai", specifier = ">=0.6.0" },
{ name = "pydantic-ai-slim", extras = ["mcp"] },
{ name = "sqlglot", specifier = ">=30.0.0" },
]
@@ -9636,6 +9650,23 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/aa/31/759d077aa680555e17c9d2bb09edf4c3428d895fe5d35a8df67684401b84/backports_zstd-1.5.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:6172dcdd664ef243e55a35e6b45f1c866767c61043f0ddcd908abd14df662065", size = 300853, upload-time = "2026-05-11T19:54:23.1Z" },
]
+[[package]]
+name = "banks"
+version = "2.4.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "deprecated" },
+ { name = "filetype" },
+ { name = "griffe" },
+ { name = "jinja2" },
+ { name = "platformdirs" },
+ { name = "pydantic" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/bd/51/08fb68d23f4b0f6256fe85dc86e9576941550f890b079352fba719e07b39/banks-2.4.2.tar.gz", hash = "sha256:cda6013bd377ea7b701933578bfb9370fc21ad70bc13cedfc3f5cb2c034ca3dc", size = 188633, upload-time = "2026-04-27T12:15:22.021Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/00/b6/8dc5477681b782e2f99de703e7a99828883364b9e03a60d3e2c47053d56a/banks-2.4.2-py3-none-any.whl", hash = "sha256:5fe407cc48c101f3e13d1cf732b83b8246003337612f13c0705d2e81f6faffb7", size = 35050, upload-time = "2026-04-27T12:15:20.785Z" },
+]
+
[[package]]
name = "bcrypt"
version = "5.0.0"
@@ -11029,6 +11060,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/1e/77/dc8c558f7593132cf8fefec57c4f60c83b16941c574ac5f619abb3ae7933/dill-0.4.1-py3-none-any.whl", hash = "sha256:1e1ce33e978ae97fcfcff5638477032b801c46c7c65cf717f95fbc2248f79a9d", size = 120019, upload-time = "2026-01-19T02:36:55.663Z" },
]
+[[package]]
+name = "dirtyjson"
+version = "1.0.8"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/db/04/d24f6e645ad82ba0ef092fa17d9ef7a21953781663648a01c9371d9e8e98/dirtyjson-1.0.8.tar.gz", hash = "sha256:90ca4a18f3ff30ce849d100dcf4a003953c79d3a2348ef056f1d9c22231a25fd", size = 30782, upload-time = "2022-11-28T23:32:33.319Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/68/69/1bcf70f81de1b4a9f21b3a62ec0c83bdff991c88d6cc2267d02408457e88/dirtyjson-1.0.8-py3-none-any.whl", hash = "sha256:125e27248435a58acace26d5c2c4c11a1c0de0a9c5124c5a94ba78e517d74f53", size = 25197, upload-time = "2022-11-28T23:32:31.219Z" },
+]
+
[[package]]
name = "distlib"
version = "0.4.0"
@@ -11551,6 +11591,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/81/47/dd9a212ef6e343a6857485ffe25bba537304f1913bdbed446a23f7f592e1/filelock-3.29.0-py3-none-any.whl", hash = "sha256:96f5f6344709aa1572bbf631c640e4ebeeb519e08da902c39a001882f30ac258", size = 39812, upload-time = "2026-04-19T15:39:08.752Z" },
]
+[[package]]
+name = "filetype"
+version = "1.2.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/bb/29/745f7d30d47fe0f251d3ad3dc2978a23141917661998763bebb6da007eb1/filetype-1.2.0.tar.gz", hash = "sha256:66b56cd6474bf41d8c54660347d37afcc3f7d1970648de365c102ef77548aadb", size = 998020, upload-time = "2022-11-02T17:34:04.141Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/18/79/1b8fa1bb3568781e84c9200f951c735f3f157429f44be0495da55894d620/filetype-1.2.0-py2.py3-none-any.whl", hash = "sha256:7ce71b6880181241cf7ac8697a2f1eb6a8bd9b429f7ad6d27b8db9ba5f1c2d25", size = 19970, upload-time = "2022-11-02T17:34:01.425Z" },
+]
+
[[package]]
name = "flask"
version = "3.1.3"
@@ -13134,6 +13183,32 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/a4/55/f6adf83dd74563aca7721d456b1d33d7656448e29cc79a6aede3bb6ffa5b/gremlinpython-3.8.1-py3-none-any.whl", hash = "sha256:2e8136f9ea8cd771f9cc6f86f4ce73130595aed414a363534e1a4e18bfa81427", size = 75457, upload-time = "2026-04-07T00:22:18.776Z" },
]
+[[package]]
+name = "griffe"
+version = "2.0.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "griffecli" },
+ { name = "griffelib" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/4a/49/eb6d2935e27883af92c930ed40cc4c69bcd32c402be43b8ca4ab20510f67/griffe-2.0.2.tar.gz", hash = "sha256:c5d56326d159f274492e9bf93a9895cec101155d944caa66d0fc4e0c13751b92", size = 293757, upload-time = "2026-03-27T11:34:52.205Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/94/c0/2bb018eecf9a83c68db9cd9fffd9dab25f102ad30ed869451046e46d1187/griffe-2.0.2-py3-none-any.whl", hash = "sha256:2b31816460aee1996af26050a1fc6927a2e5936486856707f55508e4c9b5960b", size = 5141, upload-time = "2026-03-27T11:34:47.721Z" },
+]
+
+[[package]]
+name = "griffecli"
+version = "2.0.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "colorama" },
+ { name = "griffelib" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/79/e0/6a7d661d71bb043656a109b91d84a42b5342752542074ec83b16a6eb97f0/griffecli-2.0.2.tar.gz", hash = "sha256:40a1ad4181fc39685d025e119ae2c5b669acdc1f19b705fb9bf971f4e6f6dffb", size = 56281, upload-time = "2026-03-27T11:34:50.087Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/2e/e8/90d93356c88ac34c20cb5edffca68138df55ca9bbd1a06eccfbcec8fdbe5/griffecli-2.0.2-py3-none-any.whl", hash = "sha256:0d44d39e59afa81e288a3e1c3bf352cc4fa537483326ac06b8bb6a51fd8303a0", size = 9500, upload-time = "2026-03-27T11:34:48.81Z" },
+]
+
[[package]]
name = "griffelib"
version = "2.0.2"
@@ -15035,6 +15110,100 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/02/6c/5327667e6dbe9e98cbfbd4261c8e91386a52e38f41419575854248bbab6a/litellm-1.82.6-py3-none-any.whl", hash = "sha256:164a3ef3e19f309e3cabc199bef3d2045212712fefdfa25fc7f75884a5b5b205", size = 15591595, upload-time = "2026-03-22T06:35:56.795Z" },
]
+[[package]]
+name = "llama-index-core"
+version = "0.14.22"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "aiohttp" },
+ { name = "aiosqlite" },
+ { name = "banks" },
+ { name = "dataclasses-json" },
+ { name = "deprecated" },
+ { name = "dirtyjson" },
+ { name = "filetype" },
+ { name = "fsspec" },
+ { name = "httpx" },
+ { name = "llama-index-workflows" },
+ { name = "nest-asyncio" },
+ { name = "networkx", version = "3.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "networkx", version = "3.6.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "nltk" },
+ { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "numpy", version = "2.4.5", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "pillow" },
+ { name = "platformdirs" },
+ { name = "pydantic" },
+ { name = "pyyaml" },
+ { name = "requests" },
+ { name = "setuptools" },
+ { name = "sqlalchemy", extra = ["asyncio"] },
+ { name = "tenacity" },
+ { name = "tiktoken" },
+ { name = "tinytag" },
+ { name = "tqdm" },
+ { name = "typing-extensions" },
+ { name = "typing-inspect" },
+ { name = "wrapt" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/96/7f/94a4b940ef0d069840df0fd6d361a2aa832a2dd73b4cecdf86e8f8c353c8/llama_index_core-0.14.22.tar.gz", hash = "sha256:1384410f89bdbd32349aab444ef4f5c828c338787bc65bd1ffd8e86dfb44ac41", size = 11584786, upload-time = "2026-05-14T20:21:37.271Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/39/15/e1a26d8d56aa55fa07587a3e9c7e85294d2df5af6c2229193019bc549ef6/llama_index_core-0.14.22-py3-none-any.whl", hash = "sha256:9cfffde46fd5b7937101e1c0c9bb5c21bd7ff8c8a56937810b87ba3542f31225", size = 11920774, upload-time = "2026-05-14T20:21:40.409Z" },
+]
+
+[[package]]
+name = "llama-index-embeddings-openai"
+version = "0.6.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "llama-index-core" },
+ { name = "openai" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/06/52/eb56a4887501651fb17400f7f571c1878109ff698efbe0bbac9165a5603d/llama_index_embeddings_openai-0.6.0.tar.gz", hash = "sha256:eb3e6606be81cb89125073e23c97c0a6119dabb4827adbd14697c2029ad73f29", size = 7629, upload-time = "2026-03-12T20:21:27.234Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/4e/d1/4bb0b80f4057903110060f617ef519197194b3ff5dd6153d850c8f5676fa/llama_index_embeddings_openai-0.6.0-py3-none-any.whl", hash = "sha256:039bb1007ad4267e25ddb89a206dfdab862bfb87d58da4271a3919e4f9df4d61", size = 7666, upload-time = "2026-03-12T20:21:28.079Z" },
+]
+
+[[package]]
+name = "llama-index-instrumentation"
+version = "0.5.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "deprecated" },
+ { name = "pydantic" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/4e/d0/671b23ccff255c9bce132a84ffd5a6f4541ceefdeab9c1786b08c9722f2e/llama_index_instrumentation-0.5.0.tar.gz", hash = "sha256:eeb724648b25d149de882a5ac9e21c5acb1ce780da214bda2b075341af29ad8e", size = 43831, upload-time = "2026-03-12T20:17:06.742Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c3/45/6dcaccef44e541ffa138e4b45e33e0d40ab2a7d845338483954fcf77bc75/llama_index_instrumentation-0.5.0-py3-none-any.whl", hash = "sha256:aaab83cddd9dd434278891012d8995f47a3bc7ed1736a371db90965348c56a21", size = 16444, upload-time = "2026-03-12T20:17:05.957Z" },
+]
+
+[[package]]
+name = "llama-index-llms-openai"
+version = "0.7.8"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "llama-index-core" },
+ { name = "openai" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/00/d5/2de9c05f1f1d21eb678a6044c59e943063e70099ac39b8b6f835e6e39511/llama_index_llms_openai-0.7.8.tar.gz", hash = "sha256:3352aed617ee5b7aefeb12719609ff84b4b590a1f49aa1e2e9c383d67ea88b0e", size = 27539, upload-time = "2026-05-08T20:02:09.42Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/32/49/4250108a76f4f7622109ecb9c57861829f508aba0ffdc502b27134378505/llama_index_llms_openai-0.7.8-py3-none-any.whl", hash = "sha256:967aac1f4ceff99185b2cc425c2757d4fefaf3fac0a35ace247f87a212a29359", size = 28617, upload-time = "2026-05-08T20:02:10.583Z" },
+]
+
+[[package]]
+name = "llama-index-workflows"
+version = "2.20.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "llama-index-instrumentation" },
+ { name = "pydantic" },
+ { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/c4/ec/05f3db99a2e6e252e3939e7751cad2fb1322dc6d32f4cf5c795cf7ddcad3/llama_index_workflows-2.20.0.tar.gz", hash = "sha256:df2760fea9e100c97a4e919d255461e344413acac4382d17d8217337806e4772", size = 97410, upload-time = "2026-04-24T14:54:41.524Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/71/5f/385231406d777cb4b608fd8ebe3577dbd90962770717181e6b91b44fb1b8/llama_index_workflows-2.20.0-py3-none-any.whl", hash = "sha256:36f6b6ace77f837d9907078aea7e830251afe96a58daecff5ed090c88c55095d", size = 121238, upload-time = "2026-04-24T14:54:40.455Z" },
+]
+
[[package]]
name = "lockfile"
version = "0.12.2"
@@ -16438,6 +16607,21 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/80/7c/19cd0671d1ba2762fb388fc149697d20d0568ccfeef833b11280a619e526/nh3-0.3.5-cp38-abi3-win_arm64.whl", hash = "sha256:8f85285700a18e9f3fc5bff41fe573fa84f81542ef13b48a89f9fecca0474d3b", size = 611069, upload-time = "2026-04-25T10:44:14.934Z" },
]
+[[package]]
+name = "nltk"
+version = "3.9.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "click" },
+ { name = "joblib" },
+ { name = "regex" },
+ { name = "tqdm" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/74/a1/b3b4adf15585a5bc4c357adde150c01ebeeb642173ded4d871e89468767c/nltk-3.9.4.tar.gz", hash = "sha256:ed03bc098a40481310320808b2db712d95d13ca65b27372f8a403949c8b523d0", size = 2946864, upload-time = "2026-03-24T06:13:40.641Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/9d/91/04e965f8e717ba0ab4bdca5c112deeab11c9e750d94c4d4602f050295d39/nltk-3.9.4-py3-none-any.whl", hash = "sha256:f2fa301c3a12718ce4a0e9305c5675299da5ad9e26068218b69d692fda84828f", size = 1552087, upload-time = "2026-03-24T06:13:38.47Z" },
+]
+
[[package]]
name = "nodeenv"
version = "1.10.0"
@@ -17522,6 +17706,104 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/5a/26/6cee8a1ce8c43625ec561aff19df07f9776b7525d9002c86bceb3e0ac970/pgvector-0.4.2-py3-none-any.whl", hash = "sha256:549d45f7a18593783d5eec609ea1684a724ba8405c4cb182a0b2b08aeff04e08", size = 27441, upload-time = "2025-12-05T01:07:16.536Z" },
]
+[[package]]
+name = "pillow"
+version = "12.2.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/8c/21/c2bcdd5906101a30244eaffc1b6e6ce71a31bd0742a01eb89e660ebfac2d/pillow-12.2.0.tar.gz", hash = "sha256:a830b1a40919539d07806aa58e1b114df53ddd43213d9c8b75847eee6c0182b5", size = 46987819, upload-time = "2026-04-01T14:46:17.687Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/3a/aa/d0b28e1c811cd4d5f5c2bfe2e022292bd255ae5744a3b9ac7d6c8f72dd75/pillow-12.2.0-cp310-cp310-macosx_10_10_x86_64.whl", hash = "sha256:a4e8f36e677d3336f35089648c8955c51c6d386a13cf6ee9c189c5f5bd713a9f", size = 5354355, upload-time = "2026-04-01T14:42:15.402Z" },
+ { url = "https://files.pythonhosted.org/packages/27/8e/1d5b39b8ae2bd7650d0c7b6abb9602d16043ead9ebbfef4bc4047454da2a/pillow-12.2.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:2e589959f10d9824d39b350472b92f0ce3b443c0a3442ebf41c40cb8361c5b97", size = 4695871, upload-time = "2026-04-01T14:42:18.234Z" },
+ { url = "https://files.pythonhosted.org/packages/f0/c5/dcb7a6ca6b7d3be41a76958e90018d56c8462166b3ef223150360850c8da/pillow-12.2.0-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:a52edc8bfff4429aaabdf4d9ee0daadbbf8562364f940937b941f87a4290f5ff", size = 6269734, upload-time = "2026-04-01T14:42:20.608Z" },
+ { url = "https://files.pythonhosted.org/packages/ea/f1/aa1bb13b2f4eba914e9637893c73f2af8e48d7d4023b9d3750d4c5eb2d0c/pillow-12.2.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:975385f4776fafde056abb318f612ef6285b10a1f12b8570f3647ad0d74b48ec", size = 8076080, upload-time = "2026-04-01T14:42:23.095Z" },
+ { url = "https://files.pythonhosted.org/packages/a1/2a/8c79d6a53169937784604a8ae8d77e45888c41537f7f6f65ed1f407fe66d/pillow-12.2.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:bd9c0c7a0c681a347b3194c500cb1e6ca9cab053ea4d82a5cf45b6b754560136", size = 6382236, upload-time = "2026-04-01T14:42:25.82Z" },
+ { url = "https://files.pythonhosted.org/packages/b5/42/bbcb6051030e1e421d103ce7a8ecadf837aa2f39b8f82ef1a8d37c3d4ebc/pillow-12.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:88d387ff40b3ff7c274947ed3125dedf5262ec6919d83946753b5f3d7c67ea4c", size = 7070220, upload-time = "2026-04-01T14:42:28.68Z" },
+ { url = "https://files.pythonhosted.org/packages/3f/e1/c2a7d6dd8cfa6b231227da096fd2d58754bab3603b9d73bf609d3c18b64f/pillow-12.2.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:51c4167c34b0d8ba05b547a3bb23578d0ba17b80a5593f93bd8ecb123dd336a3", size = 6493124, upload-time = "2026-04-01T14:42:31.579Z" },
+ { url = "https://files.pythonhosted.org/packages/5f/41/7c8617da5d32e1d2f026e509484fdb6f3ad7efaef1749a0c1928adbb099e/pillow-12.2.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:34c0d99ecccea270c04882cb3b86e7b57296079c9a4aff88cb3b33563d95afaa", size = 7194324, upload-time = "2026-04-01T14:42:34.615Z" },
+ { url = "https://files.pythonhosted.org/packages/2d/de/a777627e19fd6d62f84070ee1521adde5eeda4855b5cf60fe0b149118bca/pillow-12.2.0-cp310-cp310-win32.whl", hash = "sha256:b85f66ae9eb53e860a873b858b789217ba505e5e405a24b85c0464822fe88032", size = 6376363, upload-time = "2026-04-01T14:42:37.19Z" },
+ { url = "https://files.pythonhosted.org/packages/e7/34/fc4cb5204896465842767b96d250c08410f01f2f28afc43b257de842eed5/pillow-12.2.0-cp310-cp310-win_amd64.whl", hash = "sha256:673aa32138f3e7531ccdbca7b3901dba9b70940a19ccecc6a37c77d5fdeb05b5", size = 7083523, upload-time = "2026-04-01T14:42:39.62Z" },
+ { url = "https://files.pythonhosted.org/packages/2d/a0/32852d36bc7709f14dc3f64f929a275e958ad8c19a6deba9610d458e28b3/pillow-12.2.0-cp310-cp310-win_arm64.whl", hash = "sha256:3e080565d8d7c671db5802eedfb438e5565ffa40115216eabb8cd52d0ecce024", size = 2463318, upload-time = "2026-04-01T14:42:42.063Z" },
+ { url = "https://files.pythonhosted.org/packages/68/e1/748f5663efe6edcfc4e74b2b93edfb9b8b99b67f21a854c3ae416500a2d9/pillow-12.2.0-cp311-cp311-macosx_10_10_x86_64.whl", hash = "sha256:8be29e59487a79f173507c30ddf57e733a357f67881430449bb32614075a40ab", size = 5354347, upload-time = "2026-04-01T14:42:44.255Z" },
+ { url = "https://files.pythonhosted.org/packages/47/a1/d5ff69e747374c33a3b53b9f98cca7889fce1fd03d79cdc4e1bccc6c5a87/pillow-12.2.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:71cde9a1e1551df7d34a25462fc60325e8a11a82cc2e2f54578e5e9a1e153d65", size = 4695873, upload-time = "2026-04-01T14:42:46.452Z" },
+ { url = "https://files.pythonhosted.org/packages/df/21/e3fbdf54408a973c7f7f89a23b2cb97a7ef30c61ab4142af31eee6aebc88/pillow-12.2.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:f490f9368b6fc026f021db16d7ec2fbf7d89e2edb42e8ec09d2c60505f5729c7", size = 6280168, upload-time = "2026-04-01T14:42:49.228Z" },
+ { url = "https://files.pythonhosted.org/packages/d3/f1/00b7278c7dd52b17ad4329153748f87b6756ec195ff786c2bdf12518337d/pillow-12.2.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:8bd7903a5f2a4545f6fd5935c90058b89d30045568985a71c79f5fd6edf9b91e", size = 8088188, upload-time = "2026-04-01T14:42:51.735Z" },
+ { url = "https://files.pythonhosted.org/packages/ad/cf/220a5994ef1b10e70e85748b75649d77d506499352be135a4989c957b701/pillow-12.2.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3997232e10d2920a68d25191392e3a4487d8183039e1c74c2297f00ed1c50705", size = 6394401, upload-time = "2026-04-01T14:42:54.343Z" },
+ { url = "https://files.pythonhosted.org/packages/e9/bd/e51a61b1054f09437acfbc2ff9106c30d1eb76bc1453d428399946781253/pillow-12.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e74473c875d78b8e9d5da2a70f7099549f9eb37ded4e2f6a463e60125bccd176", size = 7079655, upload-time = "2026-04-01T14:42:56.954Z" },
+ { url = "https://files.pythonhosted.org/packages/6b/3d/45132c57d5fb4b5744567c3817026480ac7fc3ce5d4c47902bc0e7f6f853/pillow-12.2.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:56a3f9c60a13133a98ecff6197af34d7824de9b7b38c3654861a725c970c197b", size = 6503105, upload-time = "2026-04-01T14:42:59.847Z" },
+ { url = "https://files.pythonhosted.org/packages/7d/2e/9df2fc1e82097b1df3dce58dc43286aa01068e918c07574711fcc53e6fb4/pillow-12.2.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:90e6f81de50ad6b534cab6e5aef77ff6e37722b2f5d908686f4a5c9eba17a909", size = 7203402, upload-time = "2026-04-01T14:43:02.664Z" },
+ { url = "https://files.pythonhosted.org/packages/bd/2e/2941e42858ebb67e50ae741473de81c2984e6eff7b397017623c676e2e8d/pillow-12.2.0-cp311-cp311-win32.whl", hash = "sha256:8c984051042858021a54926eb597d6ee3012393ce9c181814115df4c60b9a808", size = 6378149, upload-time = "2026-04-01T14:43:05.274Z" },
+ { url = "https://files.pythonhosted.org/packages/69/42/836b6f3cd7f3e5fa10a1f1a5420447c17966044c8fbf589cc0452d5502db/pillow-12.2.0-cp311-cp311-win_amd64.whl", hash = "sha256:6e6b2a0c538fc200b38ff9eb6628228b77908c319a005815f2dde585a0664b60", size = 7082626, upload-time = "2026-04-01T14:43:08.557Z" },
+ { url = "https://files.pythonhosted.org/packages/c2/88/549194b5d6f1f494b485e493edc6693c0a16f4ada488e5bd974ed1f42fad/pillow-12.2.0-cp311-cp311-win_arm64.whl", hash = "sha256:9a8a34cc89c67a65ea7437ce257cea81a9dad65b29805f3ecee8c8fe8ff25ffe", size = 2463531, upload-time = "2026-04-01T14:43:10.743Z" },
+ { url = "https://files.pythonhosted.org/packages/58/be/7482c8a5ebebbc6470b3eb791812fff7d5e0216c2be3827b30b8bb6603ed/pillow-12.2.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:2d192a155bbcec180f8564f693e6fd9bccff5a7af9b32e2e4bf8c9c69dbad6b5", size = 5308279, upload-time = "2026-04-01T14:43:13.246Z" },
+ { url = "https://files.pythonhosted.org/packages/d8/95/0a351b9289c2b5cbde0bacd4a83ebc44023e835490a727b2a3bd60ddc0f4/pillow-12.2.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:f3f40b3c5a968281fd507d519e444c35f0ff171237f4fdde090dd60699458421", size = 4695490, upload-time = "2026-04-01T14:43:15.584Z" },
+ { url = "https://files.pythonhosted.org/packages/de/af/4e8e6869cbed569d43c416fad3dc4ecb944cb5d9492defaed89ddd6fe871/pillow-12.2.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:03e7e372d5240cc23e9f07deca4d775c0817bffc641b01e9c3af208dbd300987", size = 6284462, upload-time = "2026-04-01T14:43:18.268Z" },
+ { url = "https://files.pythonhosted.org/packages/e9/9e/c05e19657fd57841e476be1ab46c4d501bffbadbafdc31a6d665f8b737b6/pillow-12.2.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:b86024e52a1b269467a802258c25521e6d742349d760728092e1bc2d135b4d76", size = 8094744, upload-time = "2026-04-01T14:43:20.716Z" },
+ { url = "https://files.pythonhosted.org/packages/2b/54/1789c455ed10176066b6e7e6da1b01e50e36f94ba584dc68d9eebfe9156d/pillow-12.2.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7371b48c4fa448d20d2714c9a1f775a81155050d383333e0a6c15b1123dda005", size = 6398371, upload-time = "2026-04-01T14:43:23.443Z" },
+ { url = "https://files.pythonhosted.org/packages/43/e3/fdc657359e919462369869f1c9f0e973f353f9a9ee295a39b1fea8ee1a77/pillow-12.2.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:62f5409336adb0663b7caa0da5c7d9e7bdbaae9ce761d34669420c2a801b2780", size = 7087215, upload-time = "2026-04-01T14:43:26.758Z" },
+ { url = "https://files.pythonhosted.org/packages/8b/f8/2f6825e441d5b1959d2ca5adec984210f1ec086435b0ed5f52c19b3b8a6e/pillow-12.2.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:01afa7cf67f74f09523699b4e88c73fb55c13346d212a59a2db1f86b0a63e8c5", size = 6509783, upload-time = "2026-04-01T14:43:29.56Z" },
+ { url = "https://files.pythonhosted.org/packages/67/f9/029a27095ad20f854f9dba026b3ea6428548316e057e6fc3545409e86651/pillow-12.2.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:fc3d34d4a8fbec3e88a79b92e5465e0f9b842b628675850d860b8bd300b159f5", size = 7212112, upload-time = "2026-04-01T14:43:32.091Z" },
+ { url = "https://files.pythonhosted.org/packages/be/42/025cfe05d1be22dbfdb4f264fe9de1ccda83f66e4fc3aac94748e784af04/pillow-12.2.0-cp312-cp312-win32.whl", hash = "sha256:58f62cc0f00fd29e64b29f4fd923ffdb3859c9f9e6105bfc37ba1d08994e8940", size = 6378489, upload-time = "2026-04-01T14:43:34.601Z" },
+ { url = "https://files.pythonhosted.org/packages/5d/7b/25a221d2c761c6a8ae21bfa3874988ff2583e19cf8a27bf2fee358df7942/pillow-12.2.0-cp312-cp312-win_amd64.whl", hash = "sha256:7f84204dee22a783350679a0333981df803dac21a0190d706a50475e361c93f5", size = 7084129, upload-time = "2026-04-01T14:43:37.213Z" },
+ { url = "https://files.pythonhosted.org/packages/10/e1/542a474affab20fd4a0f1836cb234e8493519da6b76899e30bcc5d990b8b/pillow-12.2.0-cp312-cp312-win_arm64.whl", hash = "sha256:af73337013e0b3b46f175e79492d96845b16126ddf79c438d7ea7ff27783a414", size = 2463612, upload-time = "2026-04-01T14:43:39.421Z" },
+ { url = "https://files.pythonhosted.org/packages/4a/01/53d10cf0dbad820a8db274d259a37ba50b88b24768ddccec07355382d5ad/pillow-12.2.0-cp313-cp313-ios_13_0_arm64_iphoneos.whl", hash = "sha256:8297651f5b5679c19968abefd6bb84d95fe30ef712eb1b2d9b2d31ca61267f4c", size = 4100837, upload-time = "2026-04-01T14:43:41.506Z" },
+ { url = "https://files.pythonhosted.org/packages/0f/98/f3a6657ecb698c937f6c76ee564882945f29b79bad496abcba0e84659ec5/pillow-12.2.0-cp313-cp313-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:50d8520da2a6ce0af445fa6d648c4273c3eeefbc32d7ce049f22e8b5c3daecc2", size = 4176528, upload-time = "2026-04-01T14:43:43.773Z" },
+ { url = "https://files.pythonhosted.org/packages/69/bc/8986948f05e3ea490b8442ea1c1d4d990b24a7e43d8a51b2c7d8b1dced36/pillow-12.2.0-cp313-cp313-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:766cef22385fa1091258ad7e6216792b156dc16d8d3fa607e7545b2b72061f1c", size = 3640401, upload-time = "2026-04-01T14:43:45.87Z" },
+ { url = "https://files.pythonhosted.org/packages/34/46/6c717baadcd62bc8ed51d238d521ab651eaa74838291bda1f86fe1f864c9/pillow-12.2.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:5d2fd0fa6b5d9d1de415060363433f28da8b1526c1c129020435e186794b3795", size = 5308094, upload-time = "2026-04-01T14:43:48.438Z" },
+ { url = "https://files.pythonhosted.org/packages/71/43/905a14a8b17fdb1ccb58d282454490662d2cb89a6bfec26af6d3520da5ec/pillow-12.2.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:56b25336f502b6ed02e889f4ece894a72612fe885889a6e8c4c80239ff6e5f5f", size = 4695402, upload-time = "2026-04-01T14:43:51.292Z" },
+ { url = "https://files.pythonhosted.org/packages/73/dd/42107efcb777b16fa0393317eac58f5b5cf30e8392e266e76e51cff28c3d/pillow-12.2.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:f1c943e96e85df3d3478f7b691f229887e143f81fedab9b20205349ab04d73ed", size = 6280005, upload-time = "2026-04-01T14:43:54.242Z" },
+ { url = "https://files.pythonhosted.org/packages/a8/68/b93e09e5e8549019e61acf49f65b1a8530765a7f812c77a7461bca7e4494/pillow-12.2.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:03f6fab9219220f041c74aeaa2939ff0062bd5c364ba9ce037197f4c6d498cd9", size = 8090669, upload-time = "2026-04-01T14:43:57.335Z" },
+ { url = "https://files.pythonhosted.org/packages/4b/6e/3ccb54ce8ec4ddd1accd2d89004308b7b0b21c4ac3d20fa70af4760a4330/pillow-12.2.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5cdfebd752ec52bf5bb4e35d9c64b40826bc5b40a13df7c3cda20a2c03a0f5ed", size = 6395194, upload-time = "2026-04-01T14:43:59.864Z" },
+ { url = "https://files.pythonhosted.org/packages/67/ee/21d4e8536afd1a328f01b359b4d3997b291ffd35a237c877b331c1c3b71c/pillow-12.2.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:eedf4b74eda2b5a4b2b2fb4c006d6295df3bf29e459e198c90ea48e130dc75c3", size = 7082423, upload-time = "2026-04-01T14:44:02.74Z" },
+ { url = "https://files.pythonhosted.org/packages/78/5f/e9f86ab0146464e8c133fe85df987ed9e77e08b29d8d35f9f9f4d6f917ba/pillow-12.2.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:00a2865911330191c0b818c59103b58a5e697cae67042366970a6b6f1b20b7f9", size = 6505667, upload-time = "2026-04-01T14:44:05.381Z" },
+ { url = "https://files.pythonhosted.org/packages/ed/1e/409007f56a2fdce61584fd3acbc2bbc259857d555196cedcadc68c015c82/pillow-12.2.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:1e1757442ed87f4912397c6d35a0db6a7b52592156014706f17658ff58bbf795", size = 7208580, upload-time = "2026-04-01T14:44:08.39Z" },
+ { url = "https://files.pythonhosted.org/packages/23/c4/7349421080b12fb35414607b8871e9534546c128a11965fd4a7002ccfbee/pillow-12.2.0-cp313-cp313-win32.whl", hash = "sha256:144748b3af2d1b358d41286056d0003f47cb339b8c43a9ea42f5fea4d8c66b6e", size = 6375896, upload-time = "2026-04-01T14:44:11.197Z" },
+ { url = "https://files.pythonhosted.org/packages/3f/82/8a3739a5e470b3c6cbb1d21d315800d8e16bff503d1f16b03a4ec3212786/pillow-12.2.0-cp313-cp313-win_amd64.whl", hash = "sha256:390ede346628ccc626e5730107cde16c42d3836b89662a115a921f28440e6a3b", size = 7081266, upload-time = "2026-04-01T14:44:13.947Z" },
+ { url = "https://files.pythonhosted.org/packages/c3/25/f968f618a062574294592f668218f8af564830ccebdd1fa6200f598e65c5/pillow-12.2.0-cp313-cp313-win_arm64.whl", hash = "sha256:8023abc91fba39036dbce14a7d6535632f99c0b857807cbbbf21ecc9f4717f06", size = 2463508, upload-time = "2026-04-01T14:44:16.312Z" },
+ { url = "https://files.pythonhosted.org/packages/4d/a4/b342930964e3cb4dce5038ae34b0eab4653334995336cd486c5a8c25a00c/pillow-12.2.0-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:042db20a421b9bafecc4b84a8b6e444686bd9d836c7fd24542db3e7df7baad9b", size = 5309927, upload-time = "2026-04-01T14:44:18.89Z" },
+ { url = "https://files.pythonhosted.org/packages/9f/de/23198e0a65a9cf06123f5435a5d95cea62a635697f8f03d134d3f3a96151/pillow-12.2.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:dd025009355c926a84a612fecf58bb315a3f6814b17ead51a8e48d3823d9087f", size = 4698624, upload-time = "2026-04-01T14:44:21.115Z" },
+ { url = "https://files.pythonhosted.org/packages/01/a6/1265e977f17d93ea37aa28aa81bad4fa597933879fac2520d24e021c8da3/pillow-12.2.0-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:88ddbc66737e277852913bd1e07c150cc7bb124539f94c4e2df5344494e0a612", size = 6321252, upload-time = "2026-04-01T14:44:23.663Z" },
+ { url = "https://files.pythonhosted.org/packages/3c/83/5982eb4a285967baa70340320be9f88e57665a387e3a53a7f0db8231a0cd/pillow-12.2.0-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:d362d1878f00c142b7e1a16e6e5e780f02be8195123f164edf7eddd911eefe7c", size = 8126550, upload-time = "2026-04-01T14:44:26.772Z" },
+ { url = "https://files.pythonhosted.org/packages/4e/48/6ffc514adce69f6050d0753b1a18fd920fce8cac87620d5a31231b04bfc5/pillow-12.2.0-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:2c727a6d53cb0018aadd8018c2b938376af27914a68a492f59dfcaca650d5eea", size = 6433114, upload-time = "2026-04-01T14:44:29.615Z" },
+ { url = "https://files.pythonhosted.org/packages/36/a3/f9a77144231fb8d40ee27107b4463e205fa4677e2ca2548e14da5cf18dce/pillow-12.2.0-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:efd8c21c98c5cc60653bcb311bef2ce0401642b7ce9d09e03a7da87c878289d4", size = 7115667, upload-time = "2026-04-01T14:44:32.773Z" },
+ { url = "https://files.pythonhosted.org/packages/c1/fc/ac4ee3041e7d5a565e1c4fd72a113f03b6394cc72ab7089d27608f8aaccb/pillow-12.2.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:9f08483a632889536b8139663db60f6724bfcb443c96f1b18855860d7d5c0fd4", size = 6538966, upload-time = "2026-04-01T14:44:35.252Z" },
+ { url = "https://files.pythonhosted.org/packages/c0/a8/27fb307055087f3668f6d0a8ccb636e7431d56ed0750e07a60547b1e083e/pillow-12.2.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:dac8d77255a37e81a2efcbd1fc05f1c15ee82200e6c240d7e127e25e365c39ea", size = 7238241, upload-time = "2026-04-01T14:44:37.875Z" },
+ { url = "https://files.pythonhosted.org/packages/ad/4b/926ab182c07fccae9fcb120043464e1ff1564775ec8864f21a0ebce6ac25/pillow-12.2.0-cp313-cp313t-win32.whl", hash = "sha256:ee3120ae9dff32f121610bb08e4313be87e03efeadfc6c0d18f89127e24d0c24", size = 6379592, upload-time = "2026-04-01T14:44:40.336Z" },
+ { url = "https://files.pythonhosted.org/packages/c2/c4/f9e476451a098181b30050cc4c9a3556b64c02cf6497ea421ac047e89e4b/pillow-12.2.0-cp313-cp313t-win_amd64.whl", hash = "sha256:325ca0528c6788d2a6c3d40e3568639398137346c3d6e66bb61db96b96511c98", size = 7085542, upload-time = "2026-04-01T14:44:43.251Z" },
+ { url = "https://files.pythonhosted.org/packages/00/a4/285f12aeacbe2d6dc36c407dfbbe9e96d4a80b0fb710a337f6d2ad978c75/pillow-12.2.0-cp313-cp313t-win_arm64.whl", hash = "sha256:2e5a76d03a6c6dcef67edabda7a52494afa4035021a79c8558e14af25313d453", size = 2465765, upload-time = "2026-04-01T14:44:45.996Z" },
+ { url = "https://files.pythonhosted.org/packages/bf/98/4595daa2365416a86cb0d495248a393dfc84e96d62ad080c8546256cb9c0/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphoneos.whl", hash = "sha256:3adc9215e8be0448ed6e814966ecf3d9952f0ea40eb14e89a102b87f450660d8", size = 4100848, upload-time = "2026-04-01T14:44:48.48Z" },
+ { url = "https://files.pythonhosted.org/packages/0b/79/40184d464cf89f6663e18dfcf7ca21aae2491fff1a16127681bf1fa9b8cf/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:6a9adfc6d24b10f89588096364cc726174118c62130c817c2837c60cf08a392b", size = 4176515, upload-time = "2026-04-01T14:44:51.353Z" },
+ { url = "https://files.pythonhosted.org/packages/b0/63/703f86fd4c422a9cf722833670f4f71418fb116b2853ff7da722ea43f184/pillow-12.2.0-cp314-cp314-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:6a6e67ea2e6feda684ed370f9a1c52e7a243631c025ba42149a2cc5934dec295", size = 3640159, upload-time = "2026-04-01T14:44:53.588Z" },
+ { url = "https://files.pythonhosted.org/packages/71/e0/fb22f797187d0be2270f83500aab851536101b254bfa1eae10795709d283/pillow-12.2.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:2bb4a8d594eacdfc59d9e5ad972aa8afdd48d584ffd5f13a937a664c3e7db0ed", size = 5312185, upload-time = "2026-04-01T14:44:56.039Z" },
+ { url = "https://files.pythonhosted.org/packages/ba/8c/1a9e46228571de18f8e28f16fabdfc20212a5d019f3e3303452b3f0a580d/pillow-12.2.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:80b2da48193b2f33ed0c32c38140f9d3186583ce7d516526d462645fd98660ae", size = 4695386, upload-time = "2026-04-01T14:44:58.663Z" },
+ { url = "https://files.pythonhosted.org/packages/70/62/98f6b7f0c88b9addd0e87c217ded307b36be024d4ff8869a812b241d1345/pillow-12.2.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:22db17c68434de69d8ecfc2fe821569195c0c373b25cccb9cbdacf2c6e53c601", size = 6280384, upload-time = "2026-04-01T14:45:01.5Z" },
+ { url = "https://files.pythonhosted.org/packages/5e/03/688747d2e91cfbe0e64f316cd2e8005698f76ada3130d0194664174fa5de/pillow-12.2.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7b14cc0106cd9aecda615dd6903840a058b4700fcb817687d0ee4fc8b6e389be", size = 8091599, upload-time = "2026-04-01T14:45:04.5Z" },
+ { url = "https://files.pythonhosted.org/packages/f6/35/577e22b936fcdd66537329b33af0b4ccfefaeabd8aec04b266528cddb33c/pillow-12.2.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8cbeb542b2ebc6fcdacabf8aca8c1a97c9b3ad3927d46b8723f9d4f033288a0f", size = 6396021, upload-time = "2026-04-01T14:45:07.117Z" },
+ { url = "https://files.pythonhosted.org/packages/11/8d/d2532ad2a603ca2b93ad9f5135732124e57811d0168155852f37fbce2458/pillow-12.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4bfd07bc812fbd20395212969e41931001fd59eb55a60658b0e5710872e95286", size = 7083360, upload-time = "2026-04-01T14:45:09.763Z" },
+ { url = "https://files.pythonhosted.org/packages/5e/26/d325f9f56c7e039034897e7380e9cc202b1e368bfd04d4cbe6a441f02885/pillow-12.2.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9aba9a17b623ef750a4d11b742cbafffeb48a869821252b30ee21b5e91392c50", size = 6507628, upload-time = "2026-04-01T14:45:12.378Z" },
+ { url = "https://files.pythonhosted.org/packages/5f/f7/769d5632ffb0988f1c5e7660b3e731e30f7f8ec4318e94d0a5d674eb65a4/pillow-12.2.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:deede7c263feb25dba4e82ea23058a235dcc2fe1f6021025dc71f2b618e26104", size = 7209321, upload-time = "2026-04-01T14:45:15.122Z" },
+ { url = "https://files.pythonhosted.org/packages/6a/7a/c253e3c645cd47f1aceea6a8bacdba9991bf45bb7dfe927f7c893e89c93c/pillow-12.2.0-cp314-cp314-win32.whl", hash = "sha256:632ff19b2778e43162304d50da0181ce24ac5bb8180122cbe1bf4673428328c7", size = 6479723, upload-time = "2026-04-01T14:45:17.797Z" },
+ { url = "https://files.pythonhosted.org/packages/cd/8b/601e6566b957ca50e28725cb6c355c59c2c8609751efbecd980db44e0349/pillow-12.2.0-cp314-cp314-win_amd64.whl", hash = "sha256:4e6c62e9d237e9b65fac06857d511e90d8461a32adcc1b9065ea0c0fa3a28150", size = 7217400, upload-time = "2026-04-01T14:45:20.529Z" },
+ { url = "https://files.pythonhosted.org/packages/d6/94/220e46c73065c3e2951bb91c11a1fb636c8c9ad427ac3ce7d7f3359b9b2f/pillow-12.2.0-cp314-cp314-win_arm64.whl", hash = "sha256:b1c1fbd8a5a1af3412a0810d060a78b5136ec0836c8a4ef9aa11807f2a22f4e1", size = 2554835, upload-time = "2026-04-01T14:45:23.162Z" },
+ { url = "https://files.pythonhosted.org/packages/b6/ab/1b426a3974cb0e7da5c29ccff4807871d48110933a57207b5a676cccc155/pillow-12.2.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:57850958fe9c751670e49b2cecf6294acc99e562531f4bd317fa5ddee2068463", size = 5314225, upload-time = "2026-04-01T14:45:25.637Z" },
+ { url = "https://files.pythonhosted.org/packages/19/1e/dce46f371be2438eecfee2a1960ee2a243bbe5e961890146d2dee1ff0f12/pillow-12.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:d5d38f1411c0ed9f97bcb49b7bd59b6b7c314e0e27420e34d99d844b9ce3b6f3", size = 4698541, upload-time = "2026-04-01T14:45:28.355Z" },
+ { url = "https://files.pythonhosted.org/packages/55/c3/7fbecf70adb3a0c33b77a300dc52e424dc22ad8cdc06557a2e49523b703d/pillow-12.2.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5c0a9f29ca8e79f09de89293f82fc9b0270bb4af1d58bc98f540cc4aedf03166", size = 6322251, upload-time = "2026-04-01T14:45:30.924Z" },
+ { url = "https://files.pythonhosted.org/packages/1c/3c/7fbc17cfb7e4fe0ef1642e0abc17fc6c94c9f7a16be41498e12e2ba60408/pillow-12.2.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1610dd6c61621ae1cf811bef44d77e149ce3f7b95afe66a4512f8c59f25d9ebe", size = 8127807, upload-time = "2026-04-01T14:45:33.908Z" },
+ { url = "https://files.pythonhosted.org/packages/ff/c3/a8ae14d6defd2e448493ff512fae903b1e9bd40b72efb6ec55ce0048c8ce/pillow-12.2.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a34329707af4f73cf1782a36cd2289c0368880654a2c11f027bcee9052d35dd", size = 6433935, upload-time = "2026-04-01T14:45:36.623Z" },
+ { url = "https://files.pythonhosted.org/packages/6e/32/2880fb3a074847ac159d8f902cb43278a61e85f681661e7419e6596803ed/pillow-12.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8e9c4f5b3c546fa3458a29ab22646c1c6c787ea8f5ef51300e5a60300736905e", size = 7116720, upload-time = "2026-04-01T14:45:39.258Z" },
+ { url = "https://files.pythonhosted.org/packages/46/87/495cc9c30e0129501643f24d320076f4cc54f718341df18cc70ec94c44e1/pillow-12.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fb043ee2f06b41473269765c2feae53fc2e2fbf96e5e22ca94fb5ad677856f06", size = 6540498, upload-time = "2026-04-01T14:45:41.879Z" },
+ { url = "https://files.pythonhosted.org/packages/18/53/773f5edca692009d883a72211b60fdaf8871cbef075eaa9d577f0a2f989e/pillow-12.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f278f034eb75b4e8a13a54a876cc4a5ab39173d2cdd93a638e1b467fc545ac43", size = 7239413, upload-time = "2026-04-01T14:45:44.705Z" },
+ { url = "https://files.pythonhosted.org/packages/c9/e4/4b64a97d71b2a83158134abbb2f5bd3f8a2ea691361282f010998f339ec7/pillow-12.2.0-cp314-cp314t-win32.whl", hash = "sha256:6bb77b2dcb06b20f9f4b4a8454caa581cd4dd0643a08bacf821216a16d9c8354", size = 6482084, upload-time = "2026-04-01T14:45:47.568Z" },
+ { url = "https://files.pythonhosted.org/packages/ba/13/306d275efd3a3453f72114b7431c877d10b1154014c1ebbedd067770d629/pillow-12.2.0-cp314-cp314t-win_amd64.whl", hash = "sha256:6562ace0d3fb5f20ed7290f1f929cae41b25ae29528f2af1722966a0a02e2aa1", size = 7225152, upload-time = "2026-04-01T14:45:50.032Z" },
+ { url = "https://files.pythonhosted.org/packages/ff/6e/cf826fae916b8658848d7b9f38d88da6396895c676e8086fc0988073aaf8/pillow-12.2.0-cp314-cp314t-win_arm64.whl", hash = "sha256:aa88ccfe4e32d362816319ed727a004423aab09c5cea43c01a4b435643fa34eb", size = 2556579, upload-time = "2026-04-01T14:45:52.529Z" },
+ { url = "https://files.pythonhosted.org/packages/4e/b7/2437044fb910f499610356d1352e3423753c98e34f915252aafecc64889f/pillow-12.2.0-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:0538bd5e05efec03ae613fd89c4ce0368ecd2ba239cc25b9f9be7ed426b0af1f", size = 5273969, upload-time = "2026-04-01T14:45:55.538Z" },
+ { url = "https://files.pythonhosted.org/packages/f6/f4/8316e31de11b780f4ac08ef3654a75555e624a98db1056ecb2122d008d5a/pillow-12.2.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:394167b21da716608eac917c60aa9b969421b5dcbbe02ae7f013e7b85811c69d", size = 4659674, upload-time = "2026-04-01T14:45:58.093Z" },
+ { url = "https://files.pythonhosted.org/packages/d4/37/664fca7201f8bb2aa1d20e2c3d5564a62e6ae5111741966c8319ca802361/pillow-12.2.0-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5d04bfa02cc2d23b497d1e90a0f927070043f6cbf303e738300532379a4b4e0f", size = 5288479, upload-time = "2026-04-01T14:46:01.141Z" },
+ { url = "https://files.pythonhosted.org/packages/49/62/5b0ed78fce87346be7a5cfcfaaad91f6a1f98c26f86bdbafa2066c647ef6/pillow-12.2.0-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:0c838a5125cee37e68edec915651521191cef1e6aa336b855f495766e77a366e", size = 7032230, upload-time = "2026-04-01T14:46:03.874Z" },
+ { url = "https://files.pythonhosted.org/packages/c3/28/ec0fc38107fc32536908034e990c47914c57cd7c5a3ece4d8d8f7ffd7e27/pillow-12.2.0-pp311-pypy311_pp73-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4a6c9fa44005fa37a91ebfc95d081e8079757d2e904b27103f4f5fa6f0bf78c0", size = 5355404, upload-time = "2026-04-01T14:46:06.33Z" },
+ { url = "https://files.pythonhosted.org/packages/5e/8b/51b0eddcfa2180d60e41f06bd6d0a62202b20b59c68f5a132e615b75aecf/pillow-12.2.0-pp311-pypy311_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:25373b66e0dd5905ed63fa3cae13c82fbddf3079f2c8bf15c6fb6a35586324c1", size = 6002215, upload-time = "2026-04-01T14:46:08.83Z" },
+ { url = "https://files.pythonhosted.org/packages/bc/60/5382c03e1970de634027cee8e1b7d39776b778b81812aaf45b694dfe9e28/pillow-12.2.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:bfa9c230d2fe991bed5318a5f119bd6780cda2915cca595393649fc118ab895e", size = 7080946, upload-time = "2026-04-01T14:46:11.734Z" },
+]
+
[[package]]
name = "pinecone"
version = "9.0.0"
@@ -20746,8 +21028,8 @@ name = "secretstorage"
version = "3.5.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
- { name = "cryptography", marker = "python_full_version >= '3.14' or platform_machine != 'arm64' or sys_platform != 'darwin'" },
- { name = "jeepney", marker = "python_full_version >= '3.14' or platform_machine != 'arm64' or sys_platform != 'darwin'" },
+ { name = "cryptography", marker = "(python_full_version >= '3.14' and sys_platform == 'darwin') or (python_full_version < '3.15' and sys_platform == 'emscripten') or (python_full_version < '3.15' and sys_platform == 'win32') or (platform_machine != 'arm64' and sys_platform == 'darwin') or (sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'win32')" },
+ { name = "jeepney", marker = "(python_full_version >= '3.14' and sys_platform == 'darwin') or (python_full_version < '3.15' and sys_platform == 'emscripten') or (python_full_version < '3.15' and sys_platform == 'win32') or (platform_machine != 'arm64' and sys_platform == 'darwin') or (sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'win32')" },
]
sdist = { url = "https://files.pythonhosted.org/packages/1c/03/e834bcd866f2f8a49a85eaff47340affa3bfa391ee9912a952a1faa68c7b/secretstorage-3.5.0.tar.gz", hash = "sha256:f04b8e4689cbce351744d5537bf6b1329c6fc68f91fa666f60a380edddcd11be", size = 19884, upload-time = "2025-11-23T19:02:53.191Z" }
wheels = [
@@ -22082,6 +22364,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/e6/34/ebdc18bae6aa14fbee1a08b63c015c72b64868ff7dae68808ab500c492e2/tinycss2-1.4.0-py3-none-any.whl", hash = "sha256:3a49cf47b7675da0b15d0c6e1df8df4ebd96e9394bb905a5775adb0d884c5289", size = 26610, upload-time = "2024-10-24T14:58:28.029Z" },
]
+[[package]]
+name = "tinytag"
+version = "2.2.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/96/59/8a8cb2331e2602b53e4dc06960f57d1387a2b18e7efd24e5f9cb60ea4925/tinytag-2.2.1.tar.gz", hash = "sha256:e6d06610ebe7cd66fd07be2d3b9495914ab32654a5e47657bb8cd44c2484523c", size = 38214, upload-time = "2026-03-15T18:48:01.11Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ce/34/d50e338631baaf65ec5396e70085e5de0b52b24b28db1ffbc1c6e82190dc/tinytag-2.2.1-py3-none-any.whl", hash = "sha256:ed8b1e6d25367937e3321e054f4974f9abfde1a3e0a538824c87da377130c2b6", size = 32927, upload-time = "2026-03-15T18:47:59.613Z" },
+]
+
[[package]]
name = "tokenizers"
version = "0.23.1"