Change storage backend to xray.DataArray#6
Conversation
|
@mlange05 Could you test this with your storage benchmarks? Note that some changes to the benchmarks are required. In particular, you need to specify the regions beforehand. |
|
I just had a quick look and the new approach seems very nice in general. The a-priori allocation of the data array is somewhat annoying though, since I like to record all PyOP2 timer data at the end of a benchmark routine. Is there any way the allocation can be deferred? |
|
We could potentially record each region as a separate Unfortunately we still have the problem of merging data from different runs, which is still messy and I don't yet have a good idea how to solve this in a nicer way. The main advantage of the xray storage is really that it's much easier to query, so it should be possible to refactor and simplify the plotting code. |
|
OK, I think we want the results stored in a I've pushed this to another branch for inspection. Please have a go and see if this works with the current workflow used in |
Uses params as the coordinate dimension names and values.
Benchmark.data is now always a xray.Dataset keyed by region. For individual runs the array dimensions are just the params, for Datasets resulting from combine_series the dimensions are series.update(params). This allows registering new timings without pre-allocating DataArrays.
|
@mlange05 I've updated and rebase this branch, including your change, which I think makes perfect sense. I think this is ready for prime time, what do you think? |
Refactor the storage of benchmark results to use xray, an N-dimensional array with labelled coordinate axes, like an N-dimensional
pandas.Series.DataArrays can be indexed very efficiently and saved to netCDF file.
There are a few issues / differences to the previous dict based storage:
This is a WIP, not yet ready to merge, but I'd appreciate comments.