Skip to content
This repository was archived by the owner on Jun 25, 2020. It is now read-only.

Change storage backend to xray.DataArray#6

Open
kynan wants to merge 22 commits into
masterfrom
feature/xray
Open

Change storage backend to xray.DataArray#6
kynan wants to merge 22 commits into
masterfrom
feature/xray

Conversation

@kynan

@kynan kynan commented Jun 2, 2015

Copy link
Copy Markdown
Contributor

Refactor the storage of benchmark results to use xray, an N-dimensional array with labelled coordinate axes, like an N-dimensional pandas.Series.

DataArrays can be indexed very efficiently and saved to netCDF file.

There are a few issues / differences to the previous dict based storage:

  • DataArray does not support hierarchical metadata
  • The coordinate axes need to be known when initialising the DataArray i.e. the regions to time need to be declared upfront.

This is a WIP, not yet ready to merge, but I'd appreciate comments.

@kynan

kynan commented Jun 2, 2015

Copy link
Copy Markdown
Contributor Author

@mlange05 Could you test this with your storage benchmarks?

Note that some changes to the benchmarks are required. In particular, you need to specify the regions beforehand.

@mlange05

mlange05 commented Jul 3, 2015

Copy link
Copy Markdown

I just had a quick look and the new approach seems very nice in general. The a-priori allocation of the data array is somewhat annoying though, since I like to record all PyOP2 timer data at the end of a benchmark routine. Is there any way the allocation can be deferred?

@kynan

kynan commented Jul 3, 2015

Copy link
Copy Markdown
Contributor Author

We could potentially record each region as a separate DataArray and in the end concatenate them into one.

Unfortunately we still have the problem of merging data from different runs, which is still messy and I don't yet have a good idea how to solve this in a nicer way.

The main advantage of the xray storage is really that it's much easier to query, so it should be possible to refactor and simplify the plotting code.

@mlange05

Copy link
Copy Markdown

OK, I think we want the results stored in a xray.Dataset with regions as keys. This ensures that the dimensions/params for each region are the same, and allows us to use the labelled indexing feature to concatenate them when combining series. I think this also maps reasonably well with the series/params distinction in pybench.

I've pushed this to another branch for inspection. Please have a go and see if this works with the current workflow used in firedrake-bench.

@kynan

kynan commented Nov 12, 2015

Copy link
Copy Markdown
Contributor Author

@mlange05 I've updated and rebase this branch, including your change, which I think makes perfect sense. I think this is ready for prime time, what do you think?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants