Reconsidering #845 had me thinking: we're creating a very large number of tasks if the chunks are sized 1 along both time and layer.
This is a direct consequence of the IDF reading which creates one task for each file. However, the conversion to MF6 will generally operate on all the layers at once. We can greatly reduce the number of tasks and the size of the task graph by making sure chunks are merged in the layer dimension.
I'm only doubtful whether we should do this directly in open_projectfile_data such that any user would benefit from this different default. There is a downside, let's say you do something like this this:
prj_data = imod.prj.open_projectfile_data(stuff)
khv = prj_data["khv"]
for layer in khv["layer"]:
khv.sel(layer=layer).plot()
This will now load all IDFs uselessly for plotting a single layer. Of course, a simple .compute() addresses it:
khv = prj_data["khv"].compute()
But I do not expect most users to come up with this themselves.
Many other operations would probably work better without layer chunking though. Of course, the same is true for layer chunking in imod.idf.open and I haven't seen any complaints from that.
So what I'd suggest now is to remove the layer chunks in the from_imod5_data method (i.e. set layer chunk size equal to dimension size):
chunksizes = dict(da.chunksizes)
if "layer" in chunksizes:
chunksizes["layer"] = (da.sizes["layer"],)
da = da.chunk(chunksizes)
Reconsidering #845 had me thinking: we're creating a very large number of tasks if the chunks are sized 1 along both time and layer.
This is a direct consequence of the IDF reading which creates one task for each file. However, the conversion to MF6 will generally operate on all the layers at once. We can greatly reduce the number of tasks and the size of the task graph by making sure chunks are merged in the layer dimension.
I'm only doubtful whether we should do this directly in
open_projectfile_datasuch that any user would benefit from this different default. There is a downside, let's say you do something like this this:This will now load all IDFs uselessly for plotting a single layer. Of course, a simple
.compute()addresses it:But I do not expect most users to come up with this themselves.
Many other operations would probably work better without layer chunking though. Of course, the same is true for layer chunking in
imod.idf.openand I haven't seen any complaints from that.So what I'd suggest now is to remove the layer chunks in the
from_imod5_datamethod (i.e. set layer chunk size equal to dimension size):