Deidentifying pydicom.Dataset with custom class#211
Conversation
dont come from a file directly
|
This is the example that I have added: In this example we will create a custom class to deidentify a single instance of a OverviewWe will use four files for this example: The We can install them by running the following commands (requires conda create -n deid_example python=3.9
conda activate deid_example
cd my_deid_example
pip install -r requirements.txtThe contents of {
"SpecificCharacterSet":{"vr":"CS","Value":["ISO_IR 100"]},
"ImageType":{"vr":"CS","Value":["DERIVED","PRIMARY"]},
"SOPClassUID":{"vr":"UI","Value":["1.2.840.10008.5.1.4.1.1.1.2"]},
"StudyDate":{"vr":"DA","Value":["20220627"]},
"SeriesDate":{"vr":"DA","Value":["20220627"]},
"AcquisitionDate":{"vr":"DA","Value":["20220627"]},
"ContentDate":{"vr":"DA","Value":["20220627"]},
"StudyTime":{"vr":"TM","Value":["080803"]},
"ContentTime":{"vr":"TM","Value":["080808.202000"]},
"PatientName":{"vr":"PN","Value":[{"Alphabetic":"Maria^Doe"}]},
"PatientID":{"vr":"LO","Value":["1234567890"]},
"PatientBirthDate":{"vr":"DA","Value":["19900606"]},
"Modality":{"vr":"CS","Value":["MG"]},
"PatientSex":{"vr":"CS","Value":["F"]},
"PatientAge":{"vr":"AS","Value":["032Y"]},
"StudyID":{"vr":"SH","Value":["mammogram87654"]}
}The recipeWe create a custom recipe The custom deidentifier classfrom deid.config import DeidRecipe
from deid.dicom.parser import DicomParser
import pydicom
from Crypto.Hash import SHA512
from datetime import datetime
class DeidDataset:
"""This class allows to pseudonymize an instance of
pydicom.Dataset with our custom recipe and functions.
"""
def __init__(self, secret_salt: str, recipe_path: str):
"""New instance of our pseudonymizer class.
:param secret_salt: a random string that makes the
hashing harder to break.
:param recipe_path: path to our deid recipe.
"""
self.secret_salt = secret_salt
self.recipe = DeidRecipe(recipe_path)
def pseudonymize(self, dataset:pydicom.Dataset) -> pydicom.Dataset:
"""Pseudonymize a single dicom dataset
:param dataset: dataset that will be pseudonymized
:returns: pseudonymized dataset
"""
parser = DicomParser(dataset, self.recipe, from_file=False)
# register functions that are specified in the recipe
parser.define('replace_name', self.replace_name)
parser.define('hash_func', self.deid_hash_func)
parser.define('remove_day', self.remove_day)
# parse the dataset and apply the deidentification
parser.parse(strip_sequences=True, remove_private=True)
return parser.dicom
# All registered functions that are used in the recipe must
# receive the arguments: `item`, `value`, `field`, `dicom`
def deid_hash_func(self, item, value, field, dicom) -> str:
"""Performs self.hash to field.element.value"""
val = field.element.value
return self.hash(str(val))
@staticmethod
def remove_day(item, value, field, dicom) -> str:
"""Removes the day from a DT field in the deid framework"""
dt = datetime.strptime(field.element.value, '%Y%m%d')
return dt.strftime("%Y%m01")
@staticmethod
def replace_name(item, value, field, dicom) -> str:
"""Replace PatientName with PatientSex and coarse PatientAge"""
sex = dicom.get('PatientSex')
sex = {"F":'Female', "M": 'Male', 'O':'Other'}[sex]
age = DeidDataset.round_to_nearest(int(dicom.get('PatientAge')[:-1]), 5)
return f"{sex} {age:03d}Y {dicom.get('Modality')}"
# Helper methods for our registered ones
@staticmethod
def round_to_nearest(value, interval):
"""Rounds value to closest multiple of interval"""
return interval * round(value/interval)
def hash(self, msg: str) -> str:
"""
:param msg: message that we want to encrypt,
normally the PatientID or the StudyID.
:return: the encrypted message as hexdigest
(in characters from '0' to '9' and 'a' to 'f')
"""
assert type(msg) == str, f"value is not of type str, {type(msg)}"
h = SHA512.new(truncate="256")
bytes_str = bytes(f"{self.secret_salt}{msg}", "utf-8")
h.update(bytes_str)
return str(h.hexdigest())
# Load the pydicom Dataset
import json
# Unorthodox way of loading a pydicom.Dataset
# please see pydicom documentation for more information
# on how to load dicom files
with open('my_dicom_file.json') as f:
dataset_dict = json.load(f)
dataset = pydicom.Dataset.from_json(dataset_dict)
print('Dataset before pseudonymization')
print(dataset)
#create an instance of our class
deid_ds = DeidDataset("!2#4%6&7abc", 'my_deid_recipe.dicom')
#pseudonymize the dataset
print('\nDataset after pseudonymization')
pseudonymized = deid_ds.pseudonymize(dataset)
print(pseudonymized)If we execute our python module python my_module.pyIt will give us the following output: |
vsoch
left a comment
There was a problem hiding this comment.
I'm really excited about this example - it's beautifully written! I added some comments for discussion on the implementation, let me know your thoughts!
deid/dicom/parser.py
Outdated
| if not isinstance(recipe, DeidRecipe): | ||
| recipe = DeidRecipe(recipe) | ||
|
|
||
| self.from_file = from_file |
There was a problem hiding this comment.
Is there any reason this from_file should be saved with the class? E.g., isn't it just used for the load, and if so, would it make sense to pass there?
Also, is there a better way to determine this? E.g., if it's not a file that exists could we not just check for the type of the variable and not require the extra param?
There was a problem hiding this comment.
The reason to pass it in the constructor is that init runs self.load.
You are right that a more intelligent way of doing it is checking if the variable exists and if it does not, just skip the necessary lines. I will modify the PR 😄
|
Hi @vsoch, |
deid/dicom/fields.py
Outdated
| # Retrieve both dicom and file meta fields | ||
| datasets = [dicom, dicom.file_meta] | ||
| # Retrieve both dicom and file meta fields if dicom came from a file | ||
| datasets = [d for d in [dicom, dicom.get('file_meta', None)] if d is not None] |
There was a problem hiding this comment.
a get defaults to None when it isn't there so you don't need to explicitly define it.
| datasets = [d for d in [dicom, dicom.get('file_meta', None)] if d is not None] | |
| datasets = [d for d in [dicom, dicom.get('file_meta')] if d is not None] |
I think you can also just do (but I'd test first)
| datasets = [d for d in [dicom, dicom.get('file_meta', None)] if d is not None] | |
| datasets = [d for d in [dicom, dicom.get('file_meta')] if d] |
deid/dicom/parser.py
Outdated
| expand_sequences=expand_sequences, | ||
| seen=self.seen, | ||
| skip=self.skip, | ||
| skip=self.skip |
There was a problem hiding this comment.
Did you run a recent version of black on this? Usually black adds a trailing comma.
deid/dicom/parser.py
Outdated
| self, dicom_file, recipe=None, config=None, force=True, disable_skip=False | ||
| ): | ||
| self, dicom_file, recipe=None, config=None, force=True, disable_skip=False): | ||
| """Create new instance of DicomParser |
There was a problem hiding this comment.
| """Create new instance of DicomParser | |
| """ | |
| Create new instance of DicomParser |
Beautiful docstring! My preference is for a newline after the """ for the first sentence.
deid/dicom/parser.py
Outdated
| :param config: deid config, defaults to None | ||
| :param force: ignore errors when reading a dicom file, defaults to True | ||
| :param disable_skip: _description_, defaults to False | ||
| :param from_file: the dicom_file comes from an actual file, defaults to True. |
There was a problem hiding this comment.
I think this was removed?
| :param from_file: the dicom_file comes from an actual file, defaults to True. |
deid/dicom/parser.py
Outdated
| # Set class variables that might be helpful later | ||
| self.dicom_file = os.path.abspath(self.dicom.filename) | ||
| self.dicom_name = os.path.basename(self.dicom_file) | ||
| if self.dicom.get('filename', None) is not None: |
There was a problem hiding this comment.
| if self.dicom.get('filename', None) is not None: | |
| if self.dicom.get('filename'): |
| @@ -0,0 +1,254 @@ | |||
| --- | |||
| title: Deidentify a Pydicom Dataset | |||
There was a problem hiding this comment.
I'm really excited to add this! Note that we already have up to "5" in the examples folder, so either you can adjust the others order (and keep 3 here) or this should be 6. https://github.com/pydicom/deid/tree/master/docs/_docs/examples
updated the docstring remove extra "None"s
|
More updates :) |
vsoch
left a comment
There was a problem hiding this comment.
We are close! One more bug I just realized.
deid/dicom/parser.py
Outdated
| # Set class variables that might be helpful later | ||
| self.dicom_file = os.path.abspath(self.dicom.filename) | ||
| self.dicom_name = os.path.basename(self.dicom_file) | ||
| if self.dicom.get("filename", None) is not None: |
There was a problem hiding this comment.
This is going to possibly error later if there isn't a file name (and the user expects the attribute to be set). How about instead:
df = self.dicom.get("filename")
self.dicom_file = None if not df else os.path.abspath(df)
self.dicom_name = None if not df else os.path.basename(self.dicom_file)|
This is ready for merge! @fcossio I'm in the middle of a work day, but I'll merge and then prepare the release tonight, and will share with the community. Thank you for adding this! |
|
Excited to hear and looking forward to it! And thanks for the detailed feedback and all the patience |
|
Thank you again! https://twitter.com/vsoch/status/1563307727811596289 |
Description
Modified DicomParser to be able to load datasets that don't come from a file (that are loaded as json) by adding a
from_fileargument to the constructor. The default value forfrom_fileisTrueso that the interface remains unaffected.Added an example of how to create a custom class with a custom recipe and custom functions.
Related issues: #210
Please include a summary of the change(s) and if relevant, any related issues above.
Checklist
Open questions
Questions that require more discussion or to be addressed in future development: