Skip to content

open_projectfile_data: rechunk to have chunks only in time, or rechunk in conversion #1144

Description

@Huite

Reconsidering #845 had me thinking: we're creating a very large number of tasks if the chunks are sized 1 along both time and layer.

This is a direct consequence of the IDF reading which creates one task for each file. However, the conversion to MF6 will generally operate on all the layers at once. We can greatly reduce the number of tasks and the size of the task graph by making sure chunks are merged in the layer dimension.

I'm only doubtful whether we should do this directly in open_projectfile_data such that any user would benefit from this different default. There is a downside, let's say you do something like this this:

prj_data = imod.prj.open_projectfile_data(stuff)

khv = prj_data["khv"]
for layer in khv["layer"]:
     khv.sel(layer=layer).plot()

This will now load all IDFs uselessly for plotting a single layer. Of course, a simple .compute() addresses it:

khv = prj_data["khv"].compute()

But I do not expect most users to come up with this themselves.

Many other operations would probably work better without layer chunking though. Of course, the same is true for layer chunking in imod.idf.open and I haven't seen any complaints from that.

So what I'd suggest now is to remove the layer chunks in the from_imod5_data method (i.e. set layer chunk size equal to dimension size):

chunksizes = dict(da.chunksizes)
if "layer" in chunksizes:
     chunksizes["layer"] = (da.sizes["layer"],)
     da = da.chunk(chunksizes)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    🤝 Accepted

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions