Skip to content

Add providers E2E tests framework and OpenLineage tests#69212

Open
kacpermuda wants to merge 1 commit into
apache:mainfrom
kacpermuda:openlineage-e2e-tests
Open

Add providers E2E tests framework and OpenLineage tests#69212
kacpermuda wants to merge 1 commit into
apache:mainfrom
kacpermuda:openlineage-e2e-tests

Conversation

@kacpermuda

Copy link
Copy Markdown
Collaborator

Add framework for providers E2E tests and implement OpenLineage provider e2e tests to Airflow CI

The OpenLineage provider emits lineage events when Dags run, but until now there was no automated way in Airflow CI to verify those events are actually correct against a real, deployed Airflow stack. This PR introduces that capability: a self-contained e2e test harness that spins up Airflow via docker-compose, runs the provider system-test Dags, and asserts the OpenLineage events emitted by the transport match expected payloads. The tests run on demand (via workflow_dispatch) and are triggered automatically by selective checks when the openlineage or common providers, the providers-e2e-tests harness, or related files change.

The approach follows the same pattern already established by the Task SDK integration tests and the Airflow e2e tests: a standalone directory under providers-e2e-tests/openlineage/ with its own pyproject.toml, docker-compose.yaml, and pytest suite, invoked through a new breeze testing providers-e2e-tests <provider> command.

More providers can be added in the future if needed. These OL test do not rely on any external service, so they can easily be run in CI (no db, external api, google/amazon/azure service needed).

What's included:

  • providers-e2e-tests/ — new top-level directory; each provider gets its own subdirectory with a
    docker-compose stack, tests, and pyproject.toml.
  • breeze testing providers-e2e-tests <provider> — spins up the stack and runs pytest.
  • Two modes: default PROD image, or --airflow-version <ver> to test older Airflow + current providers
    from main (builds apache/airflow:<ver> + provider wheels). Required providers are declared in each
    provider's pyproject.toml under [tool.e2e-tests] required-providers — nothing hardcoded in breeze.
  • OpenLineage implementation: postgres + API server + scheduler + dag-processor + triggerer + worker.
    DAGs come from the existing providers/openlineage/tests/system/openlineage/prepare_dags.py
    copies and strips their pytest-only footers at runtime. Events are captured via VariableTransport
    (stores OL events in Airflow Variables); each DAG's OpenLineageTestOperator validates them — no
    external backend needed. A run that ends success means lineage matched.
  • Selective checks trigger the suite on openlineage / common-providers changes. Core and task-sdk changes are not wired in - a regression there wouldn't trigger these tests on the PR, only on the next canary run. Adding core/task-sdk to the file group would work technically (PROD image is already built when those change via other test paths), but it would widen the trigger surface significantly. Leaving it for discussion.
  • CI: PROD and compat matrix suite on every relevant PR and canary/main; the compat matrix (3.0.6 / 3.1.8 / 3.2.2) is reused directly from PROVIDERS_COMPATIBILITY_TESTS_MATRIX in breeze's global_constants.py - the same source as the provider unit compat tests - so adding a new Airflow release there automatically picks up a new e2e compat run. The 2.x entry is skipped via a startsWith('3.') condition since the e2e stack requires the Airflow 3 API server (we will bump min AF version for providers soon anyway, so did not waste time making it work for AF2).

I tried to add a new breeze command for this, but it's first time ever for me contributing in this area, so it may not be ideal, tried to follow an example of current code.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: Claude Sonnet 4.6 following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@boring-cyborg boring-cyborg Bot added area:dev-tools area:production-image Production image improvements and fixes backport-to-v3-3-test Backport to v3-3-test labels Jul 1, 2026
@kacpermuda kacpermuda force-pushed the openlineage-e2e-tests branch 2 times, most recently from 2091603 to a85933b Compare July 2, 2026 12:10
@kacpermuda kacpermuda requested a review from mobuchowski as a code owner July 2, 2026 12:10
@kacpermuda kacpermuda force-pushed the openlineage-e2e-tests branch from a85933b to 878b7c3 Compare July 2, 2026 13:27
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

uv.lock on main just moved via #69198 ("[main] Upgrade important CI environment"), commit 38b2869 and this PR currently conflicts.

Quickest fix:

git fetch upstream main && git rebase upstream/main
rm uv.lock && uv lock
git add uv.lock && git rebase --continue
git push --force-with-lease

Automated nudge — ignore if you're not ready to rebase. This comment is updated in place on future uv.lock bumps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:production-image Production image improvements and fixes backport-to-v3-3-test Backport to v3-3-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant