Skip to content

Pipeline execute_many and wrap in a transaction#171

Merged
chandr-andr merged 1 commit into
psqlpy-python:mainfrom
Dev-iL:2605/exec_many_perf
May 15, 2026
Merged

Pipeline execute_many and wrap in a transaction#171
chandr-andr merged 1 commit into
psqlpy-python:mainfrom
Dev-iL:2605/exec_many_perf

Conversation

@Dev-iL
Copy link
Copy Markdown
Contributor

@Dev-iL Dev-iL commented May 14, 2026

Description

Connection.execute_many / Transaction.execute_many no longer issue one round-trip per row. The implementation in src/connection/impls.rs now:

  1. Pipelines all Bind/Execute messages on the same connection via FuturesOrdered (tokio-postgres dispatches them back-to-back instead of stalling on each reply).
  2. Wraps the batch in a single transaction, which is what actually delivers the order-of-magnitude win — postgres fsyncs the WAL on every implicit auto-commit, so collapsing N auto-commits into one transaction collapses N fsyncs into one.

When invoked from Connection.execute_many, the wrap is BEGIN/COMMIT (with ROLLBACK on failure). When invoked from Transaction.execute_many, the wrap is a SAVEPOINT psqlpy_execute_many (RELEASE on success; ROLLBACK TO + RELEASE on failure) so a failed batch can never poison the caller's surrounding transaction. Internal docs on the method body explain the rationale, the asyncpg comparison, and the deliberate divergence on savepoint behaviour.

Motivation and Context

Fixes #167. Reported behaviour: execute_many was ~93× slower than asyncpg.executemany for the same workload because it issued one full round-trip per row and never amortized fsync cost. The bottleneck was visible in src/connection/impls.rs as a sequential for ... await over self.query(&stmt, &params).

The change also introduces a behavioural shift worth flagging in release notes: a mid-batch failure now rolls back earlier rows in the batch (previously each row auto-committed independently). This matches asyncpg / psycopg executemany semantics and the way bulk APIs are generally expected to behave.

How has this been tested?

Environment: PostgreSQL 14 in Docker on localhost (sub-millisecond RTT), CPython 3.13, Linux.

  • Rust-level microbenchmark against the forked tokio-postgres (1000-row INSERT batch): ~1326 ms sequential → ~32 ms pipelined-in-transaction (41× speedup, ~31 k rows/s). Pipelining without the transaction wrap only got ~1024 ms — confirms the fsync floor is the real bottleneck.
  • End-to-end through pyo3 with the same 1000-row INSERT batch from Performance issues in execute_many #167: ~128 ms / ~7,800 rows/s, versus the ~3 batches/sec the issue reports.
  • Savepoint isolation: inside a user transaction, a tx.execute_many that fails on a PK violation no longer aborts the surrounding transaction; subsequent statements in the same tx continue to succeed.
  • Connection atomicity: outside any transaction, a failed batch rolls back cleanly — no partial rows visible.
  • Existing test suite (python/tests/test_connection.py, python/tests/test_transaction.py): 37 passed, no failures attributable to this change.
  • cargo build --release, cargo clippy --release, and the project's pre-commit chain (rustfmt, clippy, cargo check, ruff, mypy) all clean.

Screenshots (if appropriate):

N/A.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) — mid-batch failure now rolls back earlier rows; previously each row auto-committed independently.

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation. (release notes should mention the atomicity semantics shift, but no user-facing API doc changes)
  • I have updated the documentation accordingly.

Replaces the per-row sequential await loop in execute_many with
concurrent futures driven via FuturesOrdered, brackets the batch in
BEGIN/COMMIT when not already in a transaction, and uses a SAVEPOINT
when invoked from Transaction.execute_many so a failed batch can never
poison the caller's surrounding transaction. The order-of-magnitude
speedup comes from collapsing N implicit auto-commits into one
WAL fsync; pipelining alone is insufficient.

Locally measured against the forked tokio-postgres: 1000-row INSERT
batch ~1326 ms sequential -> ~32 ms pipelined-in-transaction. End-to-end
through pyo3: ~128 ms for 1000 rows (~7,800 rows/s), versus the
~3 batches/sec reported in psqlpy-python#167.

Fixes psqlpy-python#167

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@chandr-andr chandr-andr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks

@chandr-andr chandr-andr merged commit 1e56b25 into psqlpy-python:main May 15, 2026
44 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance issues in execute_many

2 participants