Rework DataArray internals#648
Conversation
|
@shoyer - I read this mainly trying to get a better idea of the internal DataArray data model. The code itself looks great. My main two comments on the refactor are:
All in all, impressive work. |
|
Indeed, I wonder if it would make sense to decouple DataArray from Dataset by storing the state on two (protected) attributes:
The main downside is that we add a bit more redundant code (e.g., to loop over all variables in |
As a newbie, 👍. I took some time to figure out why a
Low confidence, but you could have a common ancestor ( |
There was a problem hiding this comment.
Very minor, but name = name or self.name might be clearer / more pythonic than the repeated if name is None:
|
@shoyer - do you have a feel for how difficult it would be to go the |
Hmm. Might not be so bad now that I've already gone through the trouble of thinking what these new tests should look like. I'll give it a shot tonight and see how it goes... |
|
I realize now that changing the internal representation for DataArray doesn't mean we need to rewrite how every routine works. We can still convert dataarrays to a dataset when convenient -- it just means we'll need to use a method to do so instead of modifying def copy(self, deep=True):
ds = self._dataset.copy(deep=deep)
return self._with_replaced_dataset(ds)and instead we could simply write: def copy(self, deep=True):
ds = self._to_temp_dataset().copy(deep=deep)
return self._new_from_temp_dataset(ds)However, going forward it will give us more flexibility for how to write DataArray methods. For example, it might actually be clearer to write: def copy(self, deep=True):
variable = self.variable.copy(deep=deep)
coords = OrderedDict((k, v.copy(deep=deep))
for k, v in self._coords.items())
return type(self)(variable, coords, name=name, fastpath=True) |
|
OK, latest commit changes DataArray's internals to rely on |
0aeea33 to
edea054
Compare
0e9b656 to
96eeb13
Compare
|
This is ready for review if anyone wants to take another look. |
There was a problem hiding this comment.
Good catch -- needed explanation. Let me know if the comments I added help.
|
@shoyer - I don't have any more inline comments. There is one failing test and there are merge conflicts, once those are addressed, I'll take one more brief look. |
5325ff9 to
c00a72b
Compare
Fixes GH367 Fixes GH634 The internal data model used by :py:class:`~xray.DataArray` has been rewritten to fix several outstanding issues (:issue:`367`, :issue:`634`, `this stackoverflow report`_). Internally, ``DataArray`` is now implemented in terms of ``._variable`` and ``._coords`` attributes instead of holding variables in a ``Dataset`` object.
24b90c3 to
f368046
Compare
|
Rebased and tests are passing. |
|
lgtm, go ahead and merge. |
|
👏 |
Fixes #367
Fixes #634
Fixes #649
The internal data model used by
DataArrayhas been rewritten to fix several outstanding issues (#367, #634 and this stackoverflow report). Namely, if a DataArray has the same name as one of its coordinates, the array and the coordinate no longer share the same data.This means that creating a DataArray with the same
nameas one of its dimensions no longer automatically uses that array to label the corresponding coordinate. You will now need to provide coordinate labels explicitly. Here's the old behavior:and the new behavior (compare the values of the
xcoordinate):It's also no longer possible to convert a DataArray to a Dataset with
DataArray.to_datasetif it is unnamed. This will now raiseValueError. If the array is unnamed, you need to supply thenameargument.