Skip to content

PostgreSQL/SQL provider connection pool is not configurable and never recycles #2344

@KoalaGeo

Description

@KoalaGeo

Description

get_engine() in pygeoapi/provider/sql.py creates the SQLAlchemy engine with:

engine = create_engine(conn_str, connect_args=connect_args, pool_pre_ping=True)

No pool parameters are passed, so SQLAlchemy uses the default QueuePool (pool_size=5, max_overflow=10) with pool_recycle=-1. The pool_size connections are therefore held open for the entire life of the worker process
and are never recycled.

In a multi-process deployment this is significant. With N replicas and W gunicorn workers, the connection floor is N * W * pool_size connections that sit IDLE on the database indefinitely, plus burst overflow. On a deployment with 4 replicas x 4 workers we observed connections idle on the PostgreSQL server for several days, eventually exhausting max_connections (shared
with other services) and producing:

org.postgresql.util.PSQLException: FATAL: remaining connection slots are
reserved for roles with the SUPERUSER attribute

There is currently no way to bound or recycle the pool from configuration.

Proposal

Allow the connection pool to be configured per provider via the existing options: block, exposing pool_size, max_overflow, pool_recycle, pool_timeout and pool_pre_ping. These would be parsed in store_db_parameters(), kept out of the DBAPI connect_args, and passed to create_engine(). The values are stored as a hashable tuple so get_engine() remains @functools.cache-able (engine sharing per process is preserved).

The same change must be applied to the PostgreSQL process manager (pygeoapi/process/manager/postgresql.py), which also calls get_engine().

Open question for the PSC / maintainers

Should the default pool_recycle remain -1 (no behaviour change; pure opt-in feature) or change to a finite value such as 3600 (recycle hourly; removes the "idle forever" footgun for everyone, but is a behaviour change for all existing users)?

I propose merging with pool_recycle=-1 as the default (zero behaviour change) and treating a finite default as a separate follow-up decision, but I'm happy to go whichever way the PSC prefers.

Environment

  • pygeoapi: master
  • Python: 3.12
  • Driver: postgresql+psycopg2, SQLAlchemy QueuePool

I'd like to contribute a fix (code + test + docs) for this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions