Switch the default async Postgres driver from asyncpg to psycopg3 (core)#68496
Switch the default async Postgres driver from asyncpg to psycopg3 (core)#68496Dev-iL wants to merge 1 commit into
Conversation
d317847 to
07bb2e6
Compare
07bb2e6 to
602f3cf
Compare
602f3cf to
9e7ad60
Compare
|
Caveat: psycopg3 uses server-side prepared statements (via protocol-level The psycopg3 docs explicitly warn that poolers are not compatible with prepared statements unless:
Our Helm chart ships PgBouncer 1.23.1 and pins psycopg >= 3.2.9, so the first and third conditions are met for chart users. But CI tests Postgres 14–18 (default 14), so the libpq requirement (>= 17) is not always satisfied. Users running their own PgBouncer < 1.22 would also hit this. Should we consider:
|
|
Quickest fix: git fetch upstream main && git rebase upstream/main
rm uv.lock && uv lock
git add uv.lock && git rebase --continue
git push --force-with-leaseAutomated nudge — ignore if you're not ready to rebase. This comment is updated in place on future |
9e7ad60 to
36c8d3c
Compare
b6b1156 to
a2fbcea
Compare
|
@ashb @uranusjr what do you say?
|
99fa906 to
70babcb
Compare
There was a problem hiding this comment.
Overall LGTM, one comment to address from my prespective.
Also, could you please rebase onto latest main? (there seems to be a drift in some packages within uv.lock)
Edit: when the order of PR merges matters, it's important also to state it at the top of the description to avoid mistakes by committers :) Added it for you.
70babcb to
f4d0b0e
Compare
|
|
||
| def _get_async_conn_uri_from_sync(sync_uri): | ||
| AIO_LIBS_MAPPING = {"sqlite": "aiosqlite", "postgresql": "asyncpg", "mysql": "aiomysql"} | ||
| AIO_LIBS_MAPPING = {"sqlite": "aiosqlite", "postgresql": "psycopg_async", "mysql": "aiomysql"} |
There was a problem hiding this comment.
Would this change cause a breakage if core is upgraded without a provider upgrade? (i.e. use a new Airflow version without upgrading providers to a version that includes #69089)
There was a problem hiding this comment.
Great catch! Yes, this is indeed flawed.
Reproduced locally: airflow-core ships no Postgres driver, so with a new core and a pre-#69089 provider (which installs asyncpg but not psycopg), the derived postgresql+psycopg_async:// URL raises ModuleNotFoundError: No module named 'psycopg' at component startup.
I changed the approach to feature-detect via the existing _USE_PSYCOPG3 flag, which falls back to asyncpg when psycopg3 isn't installed. The database-setup docs and the newsfragment were updated accordingly.
The breakage of older core is documented in the provider changelog note (#69089): those deployments need the asyncpg extra or an explicitly configured sql_alchemy_conn_async.
The derived async metadata-DB URL now prefers postgresql+psycopg_async://, which is safe behind transaction-mode PgBouncer with no configuration, and falls back to asyncpg when psycopg3 is not installed — so a newer core keeps working with an older postgres provider that ships asyncpg only. The matching driver ships in the postgres provider (>=7.0.0), which keeps asyncpg available via an opt-in extra and an explicit sql_alchemy_conn_async URL. This default takes effect in Airflow 3.4.0.
f4d0b0e to
f4179f6
Compare
depends on:
closes: #67801
What
When
[database] sql_alchemy_conn_asyncis not set, Airflow derives the async metadata-database URL fromsql_alchemy_conn. For PostgreSQL, the derived URL now uses psycopg3 (postgresql+psycopg_async://) instead of asyncpg (postgresql+asyncpg://).Packaging follows:
apache-airflow-providers-postgresnow installspsycopg[binary]by default, andasyncpgmoves to a new opt-inasyncpgextra.psycopg2-binaryis unchanged — the sync engine still uses psycopg2.Why
Airflow recommends running PgBouncer in front of PostgreSQL in production. asyncpg uses named server-side prepared statements, which break under transaction-mode PgBouncer unless prepared-statement caching is explicitly disabled. psycopg3 is safe behind transaction-mode PgBouncer with zero configuration, so the default async engine now works out of the box in recommended production deployments. A safe default beats a documentation note operators miss.
asyncpg was originally chosen only because Airflow was pinned to SQLAlchemy 1.4; that constraint is gone (
airflow-corerequiressqlalchemy[asyncio]>=2.0.48), and psycopg3 serves both sync and async from a single driver.Keeping asyncpg
asyncpg remains fully supported as a throughput opt-in:
pip install 'apache-airflow-providers-postgres[asyncpg]'Behind transaction-mode PgBouncer, also disable asyncpg's prepared-statement caching via
sql_alchemy_connect_args_async(a dict defined inairflow_local_settings.py):Notes for reviewers
postgresql+psycopg_async://(explicit form), not thepostgresql+psycopgshorthand mentioned in the issue body — chosen deliberately for explicitness, matching the explicit-sync-driver direction of Make PostgreSQL SQLAlchemy driver explicit (postgresql+psycopg2://) #68314.aiosqlite) and MySQL (aiomysql) async derivations are unchanged, and an explicitly configuredsql_alchemy_conn_asyncis never rewritten.postgresql://default flips to psycopg3) is tracked in Remove psycopg2 and migrate the sync Postgres driver to psycopg3 #68453.postgresql+psycopg2://). Use async DB session for Execution API task-instance heartbeat #67800 (async heartbeat) will rebase its docs onto this change.INSERT/COPYin localhost benchmarks. Today's async routes are single-row OLTP, so this does not affect the switch, but bulk paths must be validated before they migrate to async (Migrate API endpoints from sync to async DB sessions #67799).Validation
Validated end to end against a real transaction-mode PgBouncer in front of Postgres (
dev/pgbouncer_e2e/): the default-derivedpsycopg_asyncengine runs repeated single-row and row-returning queries with no named prepared statements left on the backend, and the documented asyncpg opt-in recipe behaves the same.Appendix: E2E tests
docker-compose.yaml
pgbouncer_e2e.py
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Fable 5) following the guidelines
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.