Skip to content

Feature/speed up resample#455

Merged
FBumann merged 8 commits intomainfrom
feature/speed-up-resample
Nov 4, 2025
Merged

Feature/speed up resample#455
FBumann merged 8 commits intomainfrom
feature/speed-up-resample

Conversation

@FBumann
Copy link
Member

@FBumann FBumann commented Nov 4, 2025

Description

Speed up resampling of a FlowSystem by 20-40x by concatenating to xr.DataArray before resampling

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Code refactoring

Related Issues

Closes #(issue number)

Testing

  • I have tested my changes
  • Existing tests still pass

Checklist

  • My code follows the project style
  • I have updated documentation if needed
  • I have added tests for new functionality (if applicable)

Summary by CodeRabbit

  • Performance Improvements
    • FlowSystem resampling operations are now significantly faster, delivering approximately 20-40x performance improvement through enhanced internal data processing.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 4, 2025

Walkthrough

The changes optimize the FlowSystem resampling operation by introducing dimension-group-aware resampling. A new private helper method groups time-dimensioned variables by their non-time dimensions and resamples each group independently to avoid unnecessary NaN broadcasting. Validation of resampling methods and preservation of timestep attributes are added to the resample flow.

Changes

Cohort / File(s) Summary
Resample optimization
flixopt/flow_system.py
Adds private _resample_by_dimension_groups() helper method that groups and resamples time-dimensioned variables by dimension structure. Refactors resample() to validate method, split variables by time-dimensionality, apply grouped resampling, merge results, and preserve timestep attributes. Imports defaultdict from collections.
Documentation update
CHANGELOG.md
Documents ~20–40× speedup improvement in FlowSystem resampling through internal dataarray conversion.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant resample as resample()
    participant split as Split Variables
    participant group as _resample_by_dimension_groups()
    participant merge as Merge Results
    
    User->>resample: resample(time_dataset, method, ...)
    resample->>resample: Validate method
    resample->>split: Separate time-dimensioned vars<br/>from non-time-dimensioned vars
    split-->>resample: time_vars, other_vars
    resample->>group: Process time_vars by<br/>dimension groups
    group->>group: Group by non-time dimensions
    group->>group: Resample each group independently
    group-->>resample: resampled_time_dataset
    resample->>merge: Merge time_dataset + other_vars
    merge-->>resample: merged_dataset
    resample->>resample: Preserve/set timestep attributes
    resample-->>User: final_dataset
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • New private helper method introduces grouping logic with straightforward grouping and merging operations
  • Resample method refactoring is localized and follows a clear pattern: validate → split → process → merge
  • Changes are focused to a single file with one public method affected

Possibly related PRs

  • PR #442: Modifies FlowSystem.resample() to preserve and set hours_of_last_timestep and hours_of_previous_timesteps attributes, which overlaps with the attribute preservation logic introduced in this PR.

Poem

🐰 Hops through dimensions with graceful care,
Grouping time-vars without a spare,
No NaN broadcasts to slow us down—
Twenty-fold speedup, the fastest in town! 🚀

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Feature/speed up resample' accurately reflects the main change: a performance optimization to the resample method that speeds it up by 20-40x.
Description check ✅ Passed The PR description includes the required sections from the template (Description, Type of Change, Related Issues, Testing, Checklist) with relevant information provided.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/speed-up-resample

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b50d538 and 7bf1925.

📒 Files selected for processing (1)
  • CHANGELOG.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.10)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67e13a5 and aa04d29.

📒 Files selected for processing (1)
  • flixopt/flow_system.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/flow_system.py (1)
flixopt/structure.py (1)
  • to_dataset (668-691)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.10)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
flixopt/flow_system.py (1)

1033-1042: Critical: Address the previously identified crashes from concat and empty merge.

The issues flagged in the previous review are still present:

  1. Line 1035-1037: xr.concat with default combine_attrs='no_conflicts' will raise MergeError when variables have different attributes (e.g., units, long_name), which is typical for FlowSystem datasets.

  2. Line 1042: xr.merge([]) will raise ValueError when resampled_groups is empty, which occurs when the FlowSystem has no time-dependent variables.

The previous review provided a fix that resolves both issues by resampling datasets directly and handling the empty case.

Apply this diff to fix both issues:

+        if not dim_groups:
+            return getattr(time_dataset.resample(time=time, **kwargs), method)()
+
         resampled_groups = []
         for var_names in dim_groups.values():
-            stacked = xr.concat(
-                [time_dataset[name] for name in var_names],
-                dim=pd.Index(var_names, name='variable'),
-            )
-            resampled = getattr(stacked.resample(time=time, **kwargs), method)()
-            resampled_groups.append(resampled.to_dataset(dim='variable'))
+            grouped_dataset = time_dataset[var_names]
+            resampled_group = getattr(grouped_dataset.resample(time=time, **kwargs), method)()
+            resampled_groups.append(resampled_group)
 
         return xr.merge(resampled_groups)

This keeps the performance optimization while avoiding attribute conflicts and empty merge errors.

🧹 Nitpick comments (1)
flixopt/flow_system.py (1)

1077-1080: Consider moving method validation earlier for fail-fast behavior.

The method validation is correct and provides a clear error message. As a minor optimization, you could move this validation to the beginning of the method (right after line 1067) to fail before any data processing. However, the current placement is acceptable since it still fails before the expensive resampling operation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa04d29 and dfa6ef9.

📒 Files selected for processing (1)
  • flixopt/flow_system.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/flow_system.py (1)
flixopt/structure.py (1)
  • to_dataset (668-691)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.10)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)
🔇 Additional comments (2)
flixopt/flow_system.py (2)

9-9: LGTM!

The defaultdict import is correctly placed and used for grouping variables by dimensions in the new resampling logic.


1071-1090: Clean separation of concerns.

The refactored resampling logic correctly:

  • Separates time-dependent from non-time variables
  • Validates the resampling method before processing
  • Resamples only time-dependent variables using the grouped helper
  • Preserves non-time variables unchanged
  • Merges results appropriately

This approach is sound and will work correctly once the critical issues in _resample_by_dimension_groups are resolved.

@FBumann FBumann merged commit 87bd422 into main Nov 4, 2025
12 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Nov 4, 2025
9 tasks
@FBumann FBumann deleted the feature/speed-up-resample branch November 19, 2025 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant