Skip to content

refactor: optimize KEEP action performance by caching field contenders & fix SyntaxWarning for invalid escape sequences#295

Merged
vsoch merged 2 commits intopydicom:masterfrom
Simlomb:optimize-keep-action-caching
Dec 5, 2025
Merged

refactor: optimize KEEP action performance by caching field contenders & fix SyntaxWarning for invalid escape sequences#295
vsoch merged 2 commits intopydicom:masterfrom
Simlomb:optimize-keep-action-caching

Conversation

@Simlomb
Copy link
Contributor

@Simlomb Simlomb commented Dec 3, 2025

refactor: optimize KEEP action performance by caching field contenders & fix SyntaxWarning for invalid escape sequences

Performance Optimization and Code Quality Improvements

This PR includes two improvements to the deid codebase:

1. Optimize KEEP action performance by caching field contenders

When multiple KEEP actions are present in a deid recipe, the expand_field_expression function was internally calling get_fields_with_lookup(dicom) for each KEEP action. This function iterates through all DICOM fields and builds lookup tables, which is an expensive operation.

Changes:

  • Modified the keep property in DicomParser to build field contenders once on the first KEEP action
  • Pass the cached contenders explicitly to subsequent expand_field_expression calls via the contenders parameter
  • Avoids redundant field enumeration and lookup table construction

Impact: Significantly reduces processing time for recipes with multiple KEEP actions, especially for DICOM files with many fields or nested sequences.

2. Fix SyntaxWarning for invalid escape sequences

Added raw string prefix (r"") to regex patterns in config/utils.py to eliminate Python SyntaxWarning about invalid escape sequences.

Changes:

  • parse_format: Changed to r"FORMAT|(\s+)"
  • load_deid: Changed to r"[%]|(\s+)"

Impact: Removes warnings and follows Python best practices for regular expressions.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • My code follows the style guidelines of this project

When multiple KEEP actions are present in a deid recipe, the
`expand_field_expression` function was internally calling
`get_fields_with_lookup(dicom)` for each KEEP action. This function
iterates through all DICOM fields and builds lookup tables, which is
an expensive operation.

This commit modifies the `keep` property to build the field contenders
once on the first KEEP action and pass it explicitly to all subsequent
`expand_field_expression` calls via the `contenders` parameter. This
avoids redundant field enumeration and lookup table construction.

This optimization significantly reduces processing time for recipes
with multiple KEEP actions, especially for DICOM files with many
fields or nested sequences.
@Simlomb
Copy link
Contributor Author

Simlomb commented Dec 3, 2025

@vsoch please let me know if you have any comments on this PR. I realized that the lookup table was not properly initialized for the KEEP action when running on several hundreds of KEEP actions in the recipe.
I also added the fix for invalid escape sequences in this PR as it is a very minor fix, but if you prefer them to be in two different PRs let me know and I'll modify it. Thanks for all the reviewing work!

@vsoch
Copy link
Member

vsoch commented Dec 3, 2025

Is there something special about KEEP vs other actions that might warrant the same?

@Simlomb
Copy link
Contributor Author

Simlomb commented Dec 4, 2025

Is there something special about KEEP vs other actions that might warrant the same?

Yes KEEP is pretty special. Other actions (REPLACE, JITTER, REMOVE, BLANK, etc.) already have this optimization. In the parse() method, self.fields is built once via get_fields() and then passed as contenders=self.fields to expand_field_expression() in perform_action().

The keep property was the only place that wasn't reusing cached field contenders. It's called early during initialization (when computing the skip list), before self.fields is populated, so it needs its own local cache.

Convert regex patterns to raw strings (r"") in config/utils.py to
eliminate SyntaxWarning about invalid escape sequences.
This follows Python best practices for regular expressions and avoids potential issues with
escape sequences.
@Simlomb Simlomb force-pushed the optimize-keep-action-caching branch from e6b5556 to 72ced39 Compare December 4, 2025 09:39
@Simlomb
Copy link
Contributor Author

Simlomb commented Dec 4, 2025

I bumped version and updated changelog. Please let me know if you have more questions.

@vsoch vsoch merged commit 52597db into pydicom:master Dec 5, 2025
3 checks passed
@vsoch
Copy link
Member

vsoch commented Dec 5, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants