Skip to content

Support imperfect dataframe in set_data() calls #2758

@briochh

Description

@briochh

Not really an issue. Just wondering about how straightforward it would be to support passing imperfect dataframes in set_data() assignments?

Currently flopy is tolerant of recarrays being passed with excess data fields:

if isinstance(data, np.recarray):
# verify data shape of data (recarray)
if len(data) == 0:
# create empty dataset
data = pandas.DataFrame(columns=self._header_names)
elif len(data[0]) != len(self._header_names):
if len(data[0]) == len(self._data_item_names):
# data most likely being stored with cellids as tuples,
# create a dataframe and untuple the cellids
# In pandas 3+, DataFrame() with recarray requires columns to match
# field names, so create without columns param then rename if needed
data = pandas.DataFrame(data)
if list(data.columns) != self._data_item_names:
data.columns = self._data_item_names
data = self._untuple_cellids(data)[0]
# make sure columns are still in correct order
data = pandas.DataFrame(data, columns=self._header_names)

However, if data is passed already as DataFrame it is expected to be perfect (with no additional columns) and with exactly the correct cellid column breakdown. [as an aside the error message below may not trigger because data[0] returns a KeyError first (iloc[0]?)]

elif isinstance(data, pandas.DataFrame):
if len(data.columns) != len(self._header_names):
message = (
f"ERROR: Data list {self._data_name} supplied the "
f"wrong number of columns of data, expected "
f"{len(self._data_item_names)} got {len(data[0])}.\n"
f"Data columns supplied: {data.columns}\n"
f"Data columns expected: {self._header_names}"
)

Could the self._untuple_cellids(data)[0] step and extraction of self._header_names happen before the if len(data.columns) != len(self._header_names): check for DataFrame instances?

EDIT:

maybe additional fields aren't supported even when using recarrays (that might be nice)? But the recarray set_data() does support allow for a generic cellid field using that self._untuple_cellids(data)[0] step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions