Skip to content

Rework orphan management to freeze orphans instead of removing them iteratively#269

Merged
jdecarolis merged 4 commits intoTemoaProject:unstablefrom
idelder:rework/orphan_management
Mar 9, 2026
Merged

Rework orphan management to freeze orphans instead of removing them iteratively#269
jdecarolis merged 4 commits intoTemoaProject:unstablefrom
idelder:rework/orphan_management

Conversation

@idelder
Copy link
Copy Markdown
Collaborator

@idelder idelder commented Mar 8, 2026

Currently, Temoa detects orphan commodities/processes (inputs that nothing produces or outputs that nothing consumes, source/demand/waste commodities excluded). It then removes these orphans and repeats until all orphan branches of the model are removed. This can lead to issues, especially in myopic, where large sections of the process network are removed and infeasibilities or other errors result from interacting model features.

This change instead "freezes" those orphans. The commodity balance constraint is constructed for the orphan commodity, which has a zero on one side of the equality, freezing connected flows at zero.

This has two key benefits:

  1. The model is much more likely to run as these sections of the network remain in case some part of the model formulation relies on their existence.
  2. Now, only the "offending" orphans are logged as warnings and marked in the network diagram, making the problem much more diagnosable as compared to having a large section of the network removed.

One key downside is that, as these branches are frozen, it should in many cases be computationally equivalent to remove them, thereby saving all those variables and constraints from the LP. We lose here that computational efficiency for the sake of robustness and interpretability.

A resulting implication is that, even with network checking turned off, sources must now be properly tagged or they will be frozen. test_system and storageville needed fixes for this reason (tests are not network checked so this did not emerge earlier).

Finally, this change also adds the commodity_sink set, a union of commodity_demand and commodity_waste to help clarify the common behaviour of those sets.

Summary by CodeRabbit

  • New Features

    • Added commodity_sink classification for commodities that exit the process network (demand and waste).
    • Redefined commodity_carrier to include physical carriers and sinks.
  • Improvements

    • Clarified and updated commodity-balance and validation messages.
    • Per-period network analysis and orphan reporting; strengthened demand/efficiency validations.
  • Tests

    • Test datasets updated to include commodity_sink entries and commodity classification adjustments.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 8, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Introduces a new commodity_sink set (union of demand and waste), updates commodity_carrier to include sinks, and propagates these semantics into flows, technology validation, and network analysis. Documentation and test datasets are updated to include commodity_sink.

Changes

Cohort / File(s) Summary
Documentation
docs/source/db_model_comparison.rst, docs/source/mathematical_formulation.rst
Added commodity_sink (C_k) and updated commodity_carrier descriptions to use C_p ∪ C_k; reordered related lines.
Core model
temoa/core/model.py
Added public set commodity_sink = commodity_demandcommodity_waste; commodity_carrier now initialized as commodity_physicalcommodity_sink.
Commodity checks
temoa/components/commodities.py
Tightened two error-condition checks from disjunction → conjunction and updated message prefix to "Nothing produces or consumes it:".
Flows / Balancing
temoa/components/flows.py
Added triplet sets commodity_upstream_rpo and commodity_downstream_rpi (r,p,o / r,p,i), filter out sinks/sources, and set commodity balance to union of these triplet sets.
Technology validation
temoa/components/technology.py
Validate unmet demands via commodity_demand - c_outputs; compute carriers from inputs/outputs (c_carrier) and compare against commodity_carrier; switched tech set checks to tech_production; updated messages.
Data IO manifest
temoa/data_io/component_manifest.py
Changed two LoadItem validators from viable_input_commsviable_comms.
Network analysis
temoa/model_checking/commodity_network_manager.py
Replaced multi-pass region analysis with single per-period pass, collect orphans per (region,period), downgraded unsupported-demand logs to warnings, removed convergence loop.
Test data (JSON)
tests/testing_data/mediumville_sets.json, tests/testing_data/test_system_sets.json, tests/testing_data/utopia_sets.json
Added commodity_sink arrays to test datasets; test_system_sets.json adjusted commodity_source/commodity_all.
Test data (SQL)
tests/testing_data/storageville.sql, tests/testing_data/test_system.sql
Updated commodity type classifications (e.g., earth, SOL) from physical (p) → source (s).

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Model as Core Model
participant Tech as Technology Component
participant Flows as Flows Component
participant CNM as Commodity Network Manager
participant DocsTests as Docs & Test Data
Model->>Tech: expose commodity_sink, commodity_carrier
Tech->>Flows: validate tech inputs/outputs vs commodity_carrier
Flows->>CNM: produce upstream/downstream triplet sets (r,p,o) / (r,p,i)
CNM->>Flows: analyze per-period orphans and warn
DocsTests->>Model: include commodity_sink in datasets and docs

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

docs

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title describes a specific implementation approach (freezing orphans instead of removing them), but the actual changes span much broader refactoring including commodity set redesigns, constraint logic updates, and test data modifications that go beyond orphan management. Consider a more encompassing title that reflects the introduction of commodity_sink and related architectural changes, or clarify whether orphan management is the primary focus with commodity_sink as supporting infrastructure.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/db_model_comparison.rst`:
- Around line 70-71: The "Model-Only Elements" / "Derived Sets" table is out of
sync: add a new entry for commodity_sink and update the commodity_carrier
description to match the earlier lines by defining commodity_sink as commodities
that exit the process network (C_d ∪ C_w) and commodity_carrier as physical
energy carriers and sinks (C_p ∪ C_k); locate the table entry for
commodity_carrier and replace its current "physical + demand + waste" text with
"physical and sink commodities (C_p ∪ C_k)" and insert a new row for
commodity_sink with "commodities that exit the process network (C_d ∪ C_w)" so
both definitions are consistent.

In `@temoa/components/technology.py`:
- Around line 347-357: Move the unmet-demand validation so it runs before the
carrier symmetry check: currently the symmetric_difference logic that computes
symdiff from model.commodity_carrier fires first and raises a generic
carrier-mismatch error (f_msg via logger.error and ValueError), which hides
missing producing processes because commodity_carrier now includes
commodity_demand; instead, run the explicit unmet-demand branch (the check that
inspects commodity_demand and missing demand outputs) prior to computing
symdiff/model.commodity_carrier so missing producers are reported directly.
Locate the symdiff block (uses symdiff, model.commodity_carrier, symdiff_str,
f_msg) and the unmet-demand check below it, and swap their order so the
unmet-demand validation executes first.

In `@temoa/model_checking/commodity_network_manager.py`:
- Around line 63-74: The loop currently only logs cn.unsupported_demands() but
does not record them in manager state, so analyze_network() still treats the
region-period as clean; modify the block to also add these unsupported demand
identifiers into a manager collection (e.g., self.demand_orphans or a new
self.unsupported_demands mapping keyed by (region, period)) by updating the set
with cn.unsupported_demands() (use set.update semantics similar to how
get_demand_side_orphans() and get_other_orphans() are added) so
analyze_network() will detect and fail regions with unsupported demands; update
any related initializations to ensure the mapping exists before update.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: caf59ff3-d22b-4b9e-9570-6068d835ce76

📥 Commits

Reviewing files that changed from the base of the PR and between e918b1b and c2c6026.

📒 Files selected for processing (13)
  • docs/source/db_model_comparison.rst
  • docs/source/mathematical_formulation.rst
  • temoa/components/commodities.py
  • temoa/components/flows.py
  • temoa/components/technology.py
  • temoa/core/model.py
  • temoa/data_io/component_manifest.py
  • temoa/model_checking/commodity_network_manager.py
  • tests/testing_data/mediumville_sets.json
  • tests/testing_data/storageville.sql
  • tests/testing_data/test_system.sql
  • tests/testing_data/test_system_sets.json
  • tests/testing_data/utopia_sets.json

Comment thread temoa/components/technology.py
Comment on lines +63 to +74
# Log any demands that are not fully supported
for commodity in cn.unsupported_demands():
logger.warning(
'Demand %s is not supported back to a source in region %s, period %d',
commodity,
region,
pass_num - 1,
period,
)
break # Exit the loop if the network is stable

logger.debug(
'Pass %d for region %s: Found and removed %d orphan(s).',
pass_num,
region,
len(orphans_this_pass),
)
for orphan in sorted(orphans_this_pass):
logger.warning('Removing orphan across all periods: %s', orphan)

# Remove all orphans found in this pass from all periods in the region
for period in self.periods:
data.available_techs[region, period] -= orphans_this_pass
else:
logger.error('Region %s analysis did not converge after 100 passes.', region)

# Add to the main collections, ensuring no duplicates
self.demand_orphans[region, period].update(cn.get_demand_side_orphans())
self.other_orphans[region, period].update(cn.get_other_orphans())
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Record unsupported demands in manager state instead of only logging them.

Right now an unsupported demand with no orphan edge tuples only emits a warning here. analyze_network() still returns True at Lines 94-95 because it only inspects self.demand_orphans and self.other_orphans, so a clearly invalid region-period can be reported as clean.

Suggested fix
 class CommodityNetworkManager:
     def __init__(self, periods: Iterable[str | int], network_data: NetworkModelData) -> None:
         self.analyzed: bool = False
         self.periods: list[Period] = sorted([Period(int(p)) for p in periods])
         self.orig_data: NetworkModelData = network_data
         self.filtered_data: NetworkModelData | None = None
         self.regions: set[Region] | None = None
@@
         self.orig_tech = {k: v.copy() for k, v in network_data.available_techs.items()}
         # Final collections of all orphans found, organized by (region, period)
         self.demand_orphans: dict[RegionPeriodKey, set[EdgeTuple]] = defaultdict(set)
         self.other_orphans: dict[RegionPeriodKey, set[EdgeTuple]] = defaultdict(set)
+        self.unsupported_demands_by_period: dict[RegionPeriodKey, set[str]] = defaultdict(set)
@@
         for period in self.periods:
             cn = CommodityNetwork(region=region, period=period, model_data=data)
             cn.analyze_network()
 
             # Log any demands that are not fully supported
-            for commodity in cn.unsupported_demands():
+            unsupported = set(cn.unsupported_demands())
+            self.unsupported_demands_by_period[region, period].update(unsupported)
+            for commodity in unsupported:
                 logger.warning(
                     'Demand %s is not supported back to a source in region %s, period %d',
                     commodity,
                     region,
                     period,
                 )
@@
-        orphans_found = any(self.demand_orphans.values()) or any(self.other_orphans.values())
-        return not orphans_found
+        issues_found = (
+            any(self.demand_orphans.values())
+            or any(self.other_orphans.values())
+            or any(self.unsupported_demands_by_period.values())
+        )
+        return not issues_found
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Log any demands that are not fully supported
for commodity in cn.unsupported_demands():
logger.warning(
'Demand %s is not supported back to a source in region %s, period %d',
commodity,
region,
pass_num - 1,
period,
)
break # Exit the loop if the network is stable
logger.debug(
'Pass %d for region %s: Found and removed %d orphan(s).',
pass_num,
region,
len(orphans_this_pass),
)
for orphan in sorted(orphans_this_pass):
logger.warning('Removing orphan across all periods: %s', orphan)
# Remove all orphans found in this pass from all periods in the region
for period in self.periods:
data.available_techs[region, period] -= orphans_this_pass
else:
logger.error('Region %s analysis did not converge after 100 passes.', region)
# Add to the main collections, ensuring no duplicates
self.demand_orphans[region, period].update(cn.get_demand_side_orphans())
self.other_orphans[region, period].update(cn.get_other_orphans())
# Log any demands that are not fully supported
unsupported = set(cn.unsupported_demands())
self.unsupported_demands_by_period[region, period].update(unsupported)
for commodity in unsupported:
logger.warning(
'Demand %s is not supported back to a source in region %s, period %d',
commodity,
region,
period,
)
# Add to the main collections, ensuring no duplicates
self.demand_orphans[region, period].update(cn.get_demand_side_orphans())
self.other_orphans[region, period].update(cn.get_other_orphans())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@temoa/model_checking/commodity_network_manager.py` around lines 63 - 74, The
loop currently only logs cn.unsupported_demands() but does not record them in
manager state, so analyze_network() still treats the region-period as clean;
modify the block to also add these unsupported demand identifiers into a manager
collection (e.g., self.demand_orphans or a new self.unsupported_demands mapping
keyed by (region, period)) by updating the set with cn.unsupported_demands()
(use set.update semantics similar to how get_demand_side_orphans() and
get_other_orphans() are added) so analyze_network() will detect and fail regions
with unsupported demands; update any related initializations to ensure the
mapping exists before update.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's okay. The purpose of that warning was to make sure orphans weren't going through unnoticed. An unmet demand causes infeasibility and cannot be missed the same way.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idelder, that's a solid distinction — unsupported demands will always surface as infeasibility during solve, making additional manager-level tracking redundant. Orphans, on the other hand, can silently shrink the model without causing infeasibility, which is precisely why they warrant explicit tracking and warnings. The current approach makes sense.

@idelder idelder marked this pull request as draft March 8, 2026 23:35
@idelder idelder marked this pull request as ready for review March 9, 2026 00:08
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/db_model_comparison.rst`:
- Around line 70-71: The entries for "commodity_sink" and "commodity_carrier"
use subscript notation (\text{C}_d, \text{C}_w, \text{C}_p, \text{C}_k) which is
inconsistent with the document's superscript form; update those to use
superscripts (\text{C}^d, \text{C}^w, \text{C}^p, \text{C}^k) so the strings for
commodity_sink and commodity_carrier match surrounding notation (and the earlier
uses of {}^*\text{C}^k / {}^*\text{C}^c).

In `@temoa/components/technology.py`:
- Around line 372-378: The comparison currently builds techs = {t for r, i, t,
v, o in model.efficiency.sparse_iterkeys()} and computes symdiff =
techs.symmetric_difference(model.tech_production); change that to compare
against model.tech_all instead (i.e., use model.tech_all in the
symmetric_difference) so efficiency-indexed technologies that are valid but not
in the production subset are not flagged; update any related message/context
that references technologies if needed (msg and the surrounding validation
block) to reflect that we're validating against tech_all.
- Around line 360-365: The error message built when symdiff is non-empty
currently says "Unused or unspecified physical carriers" but the check is
against model.commodity_carrier (which can include sink commodities and other
sets like commodity_demand/commodity_waste), so update the message string
constructed where symdiff, c_carrier and model.commodity_carrier are used to
refer generically to "carriers" (or to include the specific set name, e.g.,
commodity_carrier) instead of "physical carriers"; locate the message
construction around symdiff and c_carrier and change the wording to something
like "Unused or unspecified carriers. Either add or remove the following
elements to the Set commodity_carrier" so users are pointed to the correct
concept.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 109fa95d-9e02-44c5-a669-1d26b6116970

📥 Commits

Reviewing files that changed from the base of the PR and between c2c6026 and bfb59d6.

📒 Files selected for processing (2)
  • docs/source/db_model_comparison.rst
  • temoa/components/technology.py

Comment thread docs/source/db_model_comparison.rst
Comment thread temoa/components/technology.py
Comment on lines +372 to +378
techs = {t for r, i, t, v, o in model.efficiency.sparse_iterkeys()}

symdiff = techs.symmetric_difference(model.tech_all)
symdiff = techs.symmetric_difference(model.tech_production)
if symdiff:
msg = (
'Unused or unspecified technologies. Either add or remove '
'the following technology(ies) to the tech_resource or '
'tech_production Sets.\n\n Technology(ies): {}'
'the following technologies.\n\n Technologies: {}'
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Compare efficiency technologies against tech_all, not tech_production.

Line 374 now rejects any technology that appears in efficiency but is not in model.tech_production. That is too narrow: efficiency is indexed by tech_all, and the model context explicitly allows specialized technologies to be added to tech_all outside the p% production subset. Valid storage/exchange-style technologies will be reported as “unused or unspecified” here.

Suggested fix
-    symdiff = techs.symmetric_difference(model.tech_production)
+    symdiff = techs.symmetric_difference(model.tech_all)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
techs = {t for r, i, t, v, o in model.efficiency.sparse_iterkeys()}
symdiff = techs.symmetric_difference(model.tech_all)
symdiff = techs.symmetric_difference(model.tech_production)
if symdiff:
msg = (
'Unused or unspecified technologies. Either add or remove '
'the following technology(ies) to the tech_resource or '
'tech_production Sets.\n\n Technology(ies): {}'
'the following technologies.\n\n Technologies: {}'
techs = {t for r, i, t, v, o in model.efficiency.sparse_iterkeys()}
symdiff = techs.symmetric_difference(model.tech_all)
if symdiff:
msg = (
'Unused or unspecified technologies. Either add or remove '
'the following technologies.\n\n Technologies: {}'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@temoa/components/technology.py` around lines 372 - 378, The comparison
currently builds techs = {t for r, i, t, v, o in
model.efficiency.sparse_iterkeys()} and computes symdiff =
techs.symmetric_difference(model.tech_production); change that to compare
against model.tech_all instead (i.e., use model.tech_all in the
symmetric_difference) so efficiency-indexed technologies that are valid but not
in the production subset are not flagged; update any related message/context
that references technologies if needed (msg and the surrounding validation
block) to reflect that we're validating against tech_all.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect. With the removal of the defunct technology_resource set, all technologies are now a subset of tech_production

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idelder, thank you for the clarification — you're right, and my suggestion was incorrect. I wasn't aware that tech_resource had been removed, making tech_production the correct and complete superset for all technologies in the model. The comparison against model.tech_production is appropriate.


✏️ Learnings added
Learnt from: idelder
Repo: TemoaProject/temoa PR: 269
File: temoa/components/technology.py:372-378
Timestamp: 2026-03-09T13:24:27.000Z
Learning: In the TemoaProject/temoa codebase, the defunct `tech_resource` set has been removed. All technologies are now subsets of `tech_production`, making `tech_production` the correct set to validate against in `check_efficiency_indices` (temoa/components/technology.py). Do not suggest reverting efficiency technology validation back to `tech_all`; `tech_production` is the right reference set.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: idelder
Repo: TemoaProject/temoa PR: 180
File: temoa/components/limits.py:632-647
Timestamp: 2025-10-30T14:37:44.260Z
Learning: In the `limit_tech_input_split_average_constraint` function in `temoa/components/limits.py`, there was a bug where the `total_inp` summation incorrectly used the fixed variable `i` instead of the loop variable `S_i` when indexing `processOutputsByInput` and calling `get_variable_efficiency`. This pattern should be watched for in similar aggregation constraints: when summing over a loop variable like `S_i`, all references within that loop should use `S_i`, not a fixed outer variable.

@jdecarolis jdecarolis force-pushed the rework/orphan_management branch from bfb59d6 to a1af15a Compare March 9, 2026 20:01
@jdecarolis jdecarolis merged commit 7f7ff86 into TemoaProject:unstable Mar 9, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants