Conversation
# Conflicts: # flixopt/flow_system.py
|
Caution Review failedThe pull request is closed. WalkthroughAdds dataset-centered selection and resampling helpers to FlowSystem (new _dataset_resample, _dataset_sel, _dataset_isel, and _resample_by_dimension_groups), centralizes time-metadata computation/update, routes sel/isel/resample through dataset helpers, and introduces comprehensive resample-equivalence tests and CHANGELOG notes. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant FlowSystem
participant DatasetHelpers as Dataset Helpers
participant GroupResampler as _resample_by_dimension_groups
participant xr as xr.Dataset
User->>FlowSystem: call resample/sel/isel(...)
activate FlowSystem
FlowSystem->>DatasetHelpers: convert FlowSystem -> xr.Dataset / call helper
activate DatasetHelpers
alt resample path
DatasetHelpers->>GroupResampler: group variables by dimension sets
activate GroupResampler
GroupResampler->>xr: stack group -> DataArray
GroupResampler->>xr: resample per-group (mean/sum/...)
GroupResampler->>xr: unstack/merge -> xr.Dataset
GroupResampler-->>DatasetHelpers: grouped resampled xr.Dataset
deactivate GroupResampler
else sel/isel path
DatasetHelpers->>xr: perform sel/isel on xr.Dataset
end
DatasetHelpers->>DatasetHelpers: update time metadata (hours_per_timestep, last/prev)
DatasetHelpers-->>FlowSystem: xr.Dataset
deactivate DatasetHelpers
FlowSystem->>FlowSystem: from_dataset(...) -> FlowSystem result
FlowSystem-->>User: FlowSystem (result)
deactivate FlowSystem
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md(1 hunks)flixopt/flow_system.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/flow_system.py (1)
flixopt/structure.py (3)
copy(827-839)to_dataset(668-691)from_dataset(712-745)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: test (3.10)
# Conflicts: # CHANGELOG.md
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
flixopt/flow_system.py (1)
1203-1205: Critical: Recompute hours_per_timestep after resampling.After resampling the time dimension,
hours_per_timestepin the dataset retains aggregated values (e.g., ~1.0 aftermethod='mean') rather than reflecting the new bin durations. Power users chaining dataset operations will compute incorrect energy totals. The resampled time index must be used to recalculate duration vectors and update both thehours_per_timestepvariable and the related attributes.The fix suggested in the previous review remains applicable:
- if non_time_var_names: - non_time_dataset = dataset[non_time_var_names] - result = xr.merge([resampled_time_dataset, non_time_dataset]) - else: - result = resampled_time_dataset - - # Update time-related attributes - result.attrs['hours_of_last_timestep'] = hours_of_last_timestep - result.attrs['hours_of_previous_timesteps'] = hours_of_previous_timesteps + if non_time_var_names: + non_time_dataset = dataset[non_time_var_names] + result = xr.merge([resampled_time_dataset, non_time_dataset]) + else: + result = resampled_time_dataset + + # Recompute timestep metadata based on the resampled time index + new_time_index = result.indexes.get('time') + if new_time_index is None or len(new_time_index) < 1: + return result + + timesteps_extra = cls._create_timesteps_with_extra(new_time_index, hours_of_last_timestep) + hours_per_timestep_recomputed = cls.calculate_hours_per_timestep(timesteps_extra) + result['hours_per_timestep'] = hours_per_timestep_recomputed + + if hours_of_last_timestep is None: + hours_of_last_timestep = hours_per_timestep_recomputed.isel(time=-1).item() + if hours_of_previous_timesteps is None: + hours_of_previous_timesteps = hours_per_timestep_recomputed.isel(time=0).item() + + # Update time-related attributes + result.attrs['hours_of_last_timestep'] = hours_of_last_timestep + result.attrs['hours_of_previous_timesteps'] = hours_of_previous_timesteps
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md(1 hunks)flixopt/flow_system.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/flow_system.py (1)
flixopt/structure.py (6)
copy(827-839)to_dataset(668-691)from_dataset(712-745)values(1199-1204)values(1481-1482)hours_of_previous_timesteps(189-190)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: test (3.10)
🔇 Additional comments (8)
CHANGELOG.md (1)
65-65: LGTM!The performance improvement note appropriately documents the resampling optimization.
flixopt/flow_system.py (7)
124-139: LGTM!The power user documentation clearly demonstrates the performance benefits of dataset-based chaining.
930-974: LGTM!The implementation correctly handles label-based selection with appropriate early returns.
976-1004: LGTM!The delegation to
_dataset_selis clean, and the early return for no-op selections is a good optimization.
1006-1039: LGTM!Consistent implementation pattern with
_dataset_selfor integer-based indexing.
1041-1069: LGTM!Proper delegation to
_dataset_iselwith consistent handling.
1071-1153: Excellent performance optimization!The dimension-based grouping strategy elegantly prevents broadcasting issues while achieving significant speedups. The DataArray concatenation approach and proper handling of edge cases demonstrate careful implementation.
1209-1246: Delegation structure is sound.The public
resamplemethod properly delegates to_dataset_resample. However, this inherits the criticalhours_per_timesteprecomputation issue flagged in_dataset_resample(lines 1203-1205).
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flixopt/flow_system.py(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/flow_system.py (1)
flixopt/structure.py (6)
hours_of_previous_timesteps(189-190)get(1328-1333)get(1488-1490)copy(827-839)to_dataset(668-691)from_dataset(712-745)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: test (3.10)
- GitHub Check: test (3.12)
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/test_resample_equivalence.py (2)
26-82: Consider setting a random seed for reproducibility.The function uses
np.random.randn()without setting a seed, which makes tests non-deterministic. Whilexr.testing.assert_allcloseshould handle numerical precision differences, setting a seed (e.g.,np.random.seed(42)at the start) would ensure fully reproducible test results and make debugging easier if failures occur.Apply this diff to add a seed parameter with a default value:
-def create_dataset_with_mixed_dimensions(n_timesteps=48): +def create_dataset_with_mixed_dimensions(n_timesteps=48, seed=42): """ Create a dataset with variables having different dimension structures. This mimics realistic data with: - Variables with only time dimension - Variables with time + one other dimension - Variables with time + multiple dimensions """ + np.random.seed(seed) timesteps = pd.date_range('2020-01-01', periods=n_timesteps, freq='h')
85-272: Consider testing the**kwargsparameter path.The
_resample_by_dimension_groupsmethod accepts**kwargsthat are forwarded toxarray.resample(), but none of the tests exercise this parameter. Consider adding a test that passes additional arguments (e.g.,label='right',closed='right') to verify they're correctly forwarded through the optimization path.Example test to add:
def test_resample_equivalence_with_kwargs(): """Test that kwargs are properly forwarded to resample().""" timesteps = pd.date_range('2020-01-01', periods=48, freq='h') ds = xr.Dataset(coords={'time': timesteps}) ds['var'] = xr.DataArray(np.random.randn(48), dims=['time']) kwargs = {'label': 'right', 'closed': 'right'} result_optimized = fx.FlowSystem._resample_by_dimension_groups( ds, '2h', 'mean', **kwargs ) result_naive = getattr(ds.resample(time='2h', **kwargs), 'mean')() xr.testing.assert_allclose(result_optimized, result_naive)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_resample_equivalence.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_resample_equivalence.py (1)
flixopt/flow_system.py (2)
resample(1309-1347)_resample_by_dimension_groups(1174-1254)
🪛 GitHub Actions: Python Package CI/CD
tests/test_resample_equivalence.py
[warning] 93-93: Ruff formatting: 1 file would be reformatted; 55 files already formatted. The change shown in the diff suggests a multi-line function call was reformatted to a single line.
[error] 1-1: Ruff formatting failed (exit code 1). Run 'ruff format' to fix code style issues.
🔇 Additional comments (5)
tests/test_resample_equivalence.py (5)
129-141: Good edge case coverage!Testing the empty dataset case ensures the optimization handles this gracefully. The implementation should return an empty resampled dataset without errors.
159-180: Excellent NaN handling coverage!This test ensures both the optimized and naive approaches handle NaN values consistently across aggregation methods. This is critical for correctness since NaN handling can differ subtly between xarray operations.
182-219: Dimension order independence is well-tested.This test verifies that the grouping logic correctly handles variables regardless of where the time dimension appears. This is important since the
_resample_by_dimension_groupsimplementation groups by non-time dimensions.
221-244: Key optimization case is properly validated.This test directly exercises the core benefit of
_resample_by_dimension_groups: grouping multiple variables with identical dimensions for efficient batch resampling. The comment correctly identifies this as the key optimization scenario.
246-272: Realistic scale testing with pragmatic method selection.Testing with a larger dataset (168 timesteps, multiple dimensions) ensures the optimization works at realistic scales. The selective method testing is a sensible tradeoff between coverage and test performance.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/test_resample_equivalence.py (1)
85-86: Consider adding 'count' method to test coverage.The
_dataset_resamplemethod inflow_system.pylists 'count' as an available method (line 1256), but it's not included in any of the parametrized test methods here. While the current coverage of 9 other methods is thorough, adding 'count' would ensure complete method coverage.Apply this diff to include 'count' in the first parametrized test:
-@pytest.mark.parametrize('method', ['mean', 'sum', 'max', 'min', 'first', 'last']) +@pytest.mark.parametrize('method', ['mean', 'sum', 'max', 'min', 'first', 'last', 'count']) @pytest.mark.parametrize('freq', ['2h', '4h', '1D']) def test_resample_equivalence_mixed_dimensions(method, freq):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_resample_equivalence.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_resample_equivalence.py (1)
flixopt/flow_system.py (3)
resample(1309-1347)FlowSystem(49-1351)_resample_by_dimension_groups(1174-1254)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test (3.12)
- GitHub Check: test (3.10)
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
🔇 Additional comments (3)
tests/test_resample_equivalence.py (3)
17-82: LGTM! Well-structured helper functions.The
naive_dataset_resamplefunction provides a clear baseline for comparison, andcreate_dataset_with_mixed_dimensionscreates comprehensive test data covering various dimension combinations that would appear in real-world energy system modeling.
85-269: Excellent test coverage for resampling equivalence.The test suite comprehensively validates the dimension-grouping optimization against the naive approach across multiple scenarios:
- Mixed dimension structures (the critical optimization case)
- Edge cases (empty datasets, single variables)
- Data quality issues (NaNs)
- Dimension ordering variations
- Realistic scales
The systematic use of
xr.testing.assert_allcloseensures proper floating-point comparison across all scenarios.
272-273: LGTM!Standard pytest main block for direct test execution.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_resample_equivalence.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_resample_equivalence.py (1)
flixopt/flow_system.py (3)
resample(1309-1347)FlowSystem(49-1351)_resample_by_dimension_groups(1174-1254)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: test (3.10)
- GitHub Check: test (3.11)
Description
Further speed up resampling and add options to speed up even more by avoiding round trips to and from dataset
Type of Change
Related Issues
Closes #(issue number)
Testing
Checklist
Summary by CodeRabbit
New Features
Refactor
Tests
Documentation