Skip to content

Improve read performance #1004

Description

@jamblejoe

Is your feature request related to a problem? Please describe.
We are a slightly disappointed by the read performance from DB2 databases via this python package. We use ibm_db_dbi and pandas read_sql to read in data. This calls into

def _fetch_helper(self, fetch_size=-1):

where the result is processed in a python loop and every fetched tuple is appended to a list. This is suboptimal performance wise.

There is a faster solution since the end of last year due to this PR #971, fetchall implemented in

static PyObject *ibm_db_fetchall(PyObject *self, PyObject *args)

Here, the loop is implemented in C but there are a lot of python-checks for every fetched tuple. The performance is 4x faster than the ibm_db_dbi interface.

Other packages are providing a much better performance. E.g., the rust package https://github.com/pacman82/arrow-odbc wrapped in https://github.com/pacman82/arrow-odbc-py calls the IBM driver via the ODBC interface and bulk reads in rust into the arrow format. The python wrapper of the rust crate has a very easy syntax:

reader = arrow_odbc.read_arrow_batches_from_odbc(
        query=f"""
                SELECT col1, col2, ...
                FROM TEST_PULL """,
        connection_string=connection_string,
        batch_size=10_000,
    )

batches = []
for batch in reader:
    batches.append(batch)

df = pl.from_arrow(batches) # polars df

A little test gave us the following performance chart (lower is better)

Image
Note the logarithmic scale of the axis.

Summarized: arrow-odbc-py gives as ~15x performance boost over ibm_db_dbi interface and ~4x boost over fetchall from ibm_db.

Describe the solution you'd like
We would like to continue to use ibm_db_dbi, but the performance sacrifices compared to other packages are too large. Please enhance the performance of reading from DB2.

Describe alternatives you've considered
Other packages, like arrow-odbc-py, allow for significantly faster reads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions