Skip to content

updated translate to accommodate multiple donors#1032

Open
DerekFurstPitt wants to merge 1 commit into
dev-integratefrom
Derek-Furst/new-reindex-optimization
Open

updated translate to accommodate multiple donors#1032
DerekFurstPitt wants to merge 1 commit into
dev-integratefrom
Derek-Furst/new-reindex-optimization

Conversation

@DerekFurstPitt
Copy link
Copy Markdown
Contributor

No description provided.

@yuanzhou
Copy link
Copy Markdown
Member

@NickAkhmetov This is part of the Dataset.donors and Publication.donors support that I mentioned earlier. After the full reindex on DEV, we've noticed that the portal indices are missing the three properties from the plural form of .donors:

  • mapped_metadata
  • mapped_data_access_level
  • mapped_last_modified_timestamp

@DerekFurstPitt had to dig into the portal translation code to address this issue. I wanted to get your review on this as well, feel free to make any additional tweaks or possibly add unit tests based on your needs and workflow.

@NickAkhmetov
Copy link
Copy Markdown
Contributor

This is a good start!

the portal indices are missing the three properties from the plural form of .donors

These fields are calculated by the portal transformations as described below:

mapped_metadata

This is populated by _translate_donor_metadata in translate.py. It should be updated to create a single donor_demographics (or similar) field on the dataset document, containing precalculated aggregations of metadata from all of the donors associated with the dataset (e.g. age, weight, height, and BMI ranges/averages, and race/sex category lists). This aggregation would supplant the Dataset.donor.mapped_metadata. fields currently used to determine donor race/sex/age/weight in the search view and elsewhere so we can accurately reflect lookups in the search. Let me know if you'd prefer for me to add/implement this in an additional PR, or if this specification is sufficient.

mapped_data_access_level

This is populated by _translate_access_level in translate.py, but I don't think we ever read it from Dataset.donor.mapped_data_access_level - as long as the mapped_data_access_level is available on the individual Donor documents, omitting this from the plural form should be fine.

mapped_last_modified_timestamp

This doesn't appear to have been used since 2023 or so, besides on the dev search page - not sure if that is still needed there/what it provided over the last_modified_timestamp column we actually do use.

@yuanzhou
Copy link
Copy Markdown
Member

@NickAkhmetov Thanks for the insights! I'd appreciate it if you could take over the portal translation updates since you're most familiar with the implementation details.

This top-level donors field for Dataset and Publication is treated similarly to the existing source_samples and origin_samples fields. Please feel free to branch off Derek-Furst/new-reindex-optimization, apply the necessary changes, and open a PR back into Derek's branch. I'll handle the rest from there.

The plan is that once the Harvard team and IU have fully migrated to consuming the donors field, we'll deprecate usage of the singular donor field and eventually remove it from the indices.

@NickAkhmetov
Copy link
Copy Markdown
Contributor

@yuanzhou Understood - I've scheduled this item for development early in the upcoming sprint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants