refactor: optimize KEEP action performance by caching field contenders & fix SyntaxWarning for invalid escape sequences#295
Conversation
When multiple KEEP actions are present in a deid recipe, the `expand_field_expression` function was internally calling `get_fields_with_lookup(dicom)` for each KEEP action. This function iterates through all DICOM fields and builds lookup tables, which is an expensive operation. This commit modifies the `keep` property to build the field contenders once on the first KEEP action and pass it explicitly to all subsequent `expand_field_expression` calls via the `contenders` parameter. This avoids redundant field enumeration and lookup table construction. This optimization significantly reduces processing time for recipes with multiple KEEP actions, especially for DICOM files with many fields or nested sequences.
|
@vsoch please let me know if you have any comments on this PR. I realized that the lookup table was not properly initialized for the KEEP action when running on several hundreds of KEEP actions in the recipe. |
|
Is there something special about |
Yes The |
Convert regex patterns to raw strings (r"") in config/utils.py to eliminate SyntaxWarning about invalid escape sequences. This follows Python best practices for regular expressions and avoids potential issues with escape sequences.
e6b5556 to
72ced39
Compare
|
I bumped version and updated changelog. Please let me know if you have more questions. |
refactor: optimize KEEP action performance by caching field contenders & fix SyntaxWarning for invalid escape sequences
Performance Optimization and Code Quality Improvements
This PR includes two improvements to the deid codebase:
1. Optimize KEEP action performance by caching field contenders
When multiple KEEP actions are present in a deid recipe, the
expand_field_expressionfunction was internally callingget_fields_with_lookup(dicom)for each KEEP action. This function iterates through all DICOM fields and builds lookup tables, which is an expensive operation.Changes:
keepproperty inDicomParserto build field contenders once on the first KEEP actionexpand_field_expressioncalls via thecontendersparameterImpact: Significantly reduces processing time for recipes with multiple KEEP actions, especially for DICOM files with many fields or nested sequences.
2. Fix SyntaxWarning for invalid escape sequences
Added raw string prefix (
r"") to regex patterns inconfig/utils.pyto eliminate Python SyntaxWarning about invalid escape sequences.Changes:
parse_format: Changed tor"FORMAT|(\s+)"load_deid: Changed tor"[%]|(\s+)"Impact: Removes warnings and follows Python best practices for regular expressions.
Checklist