You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
flox supports grouping by multiple variables (would fix #324, #1056) and grouping by dask variables (would fix #2852).
To enable this in GroupBy we need to update the constructor's signature to
Accept multiple "by" variables.
Accept "expected group labels" for grouping by dask variables (like bins for groupby_bins which already supports grouping by dask variables). This lets us construct the output coordinate without evaluating the dask variable.
We may also want to simultaneously group by a categorical variable (season) and bin by a continuous variable (air temperature). So we also need a way to indicate whether the "expected group labels" are "bin edges" or categories.
We could add a top-level xr.Bins object that wraps bin edges + any kwargs to be passed to pandas.cut. Note our current groupby_bins signature has a bunch of kwargs passed directly to pandas.cut.
Finally add groups: None | ArrayLike | xarray.Bins | Iterable[None | ArrayLike | xarray.Bins] to pass the "expected group labels".
If None, then groups will be auto-detected from non-dask group arrays (if None for a dask group, then raise error).
If xarray.Bins indicates binning by the appropriate variables
If ArrayLike treat as categorical.
groups is a little too similar to group so we should choose a better name.
The ordering of ArrayLike would let us fix Ordered Groupby Keys #757 (pass the seasons in the order you want them in the output)
So then that example becomes
ds.groupby(
["season", "air_temperature"], # season is numpy, air_temperature is daskgroups=[None, xr.Bins(np.arange(21, 30, 1), closed="right")],
)
What is your issue?
floxsupports grouping by multiple variables (would fix #324, #1056) and grouping by dask variables (would fix #2852).To enable this in GroupBy we need to update the constructor's signature to
binsforgroupby_binswhich already supports grouping by dask variables). This lets us construct the output coordinate without evaluating the dask variable.The signature in flox is (may be errors!)
You would calculate that last example using flox as
The use of
expected_groupsandisbinseems ugly to me (the names could also be better!)I propose we update groupby's signature to
group: DataArray | strtogroup: DataArray | str | Iterable[str] | Iterable[DataArray]xr.Binsobject that wraps bin edges + any kwargs to be passed topandas.cut. Note our current groupby_bins signature has a bunch of kwargs passed directly to pandas.cut.groups: None | ArrayLike | xarray.Bins | Iterable[None | ArrayLike | xarray.Bins]to pass the "expected group labels".None, then groups will be auto-detected from non-daskgrouparrays (ifNonefor a daskgroup, then raise error).xarray.Binsindicates binning by the appropriate variablesArrayLiketreat as categorical.groupsis a little too similar togroupso we should choose a better name.ArrayLikewould let us fix Ordered Groupby Keys #757 (pass the seasons in the order you want them in the output)So then that example becomes
Thoughts?