DOSDP Permutations Feature Implementation #503
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
The
permutationsfeature extends DOSDP's annotation generation capabilities by allowing pattern authors to specify that additional annotations should be generated using values from annotation properties on filler terms, rather than only using the term's label.Motivation
When generating synonyms from DOSDP patterns, the current approach substitutes the label of filler terms into the annotation text. However, filler terms often have multiple synonyms themselves, and ideally the generated term should have synonyms that incorporate these alternative names.
Example: If we have a pattern for "acute {disease}" and the disease filler is "heart disease" which has synonyms "cardiac disease" and "heart condition", we want to generate:
Schema Definition
New
permutationDefinitionAddition to
printf_annotationandprintf_annotation_oboComplete Example Pattern
Semantic Behavior
Core Rules
Label-based generation is ALWAYS performed first. The permutations feature is purely additive - it never replaces the standard label-based annotation generation.
The
varfield is a single string value, not an array. To specify permutations for multiple variables, add multiple entries to thepermutationsarray.All values from all specified annotation properties are used. If
annotationPropertieslists[exact_synonym, related_synonym], values from both properties are collected and used for permutations.Cartesian product for multiple vars. If multiple permutation entries exist for different variables, the implementation must generate the cartesian product of all value combinations.
Detailed Example with Expected Output
Input Pattern:
Filler Term Data:
qualitybound to term with:entitybound to term with:Expected Generated Annotations:
First, the label-based annotation (always generated):
Then, permutations for
quality(using label for entity):2. "big heart" (exact_synonym × label)
3. "hypertrophic heart" (exact_synonym × label)
Then, permutations for
entity(using label for quality):4. "enlarged cardiac organ" (label × exact_synonym)
5. "enlarged pump" (label × related_synonym)
Then, cartesian product of both permutations:
6. "big cardiac organ" (exact_synonym × exact_synonym)
7. "big pump" (exact_synonym × related_synonym)
8. "hypertrophic cardiac organ" (exact_synonym × exact_synonym)
9. "hypertrophic pump" (exact_synonym × related_synonym)
Total: 9 annotations (1 label-based + 8 permutation-based)
Formula for Number of Annotations
For a pattern with
nvariables where variableihas:s_isynonym values from the specified annotation properties (0 if no permutation specified for that var)The total number of annotations generated is:
Where the product includes all variables in the
varslist.Edge Cases and Special Handling
1. Empty Annotation Property Values
If a filler term has no values for any of the specified annotation properties, only the label-based annotation is generated. No error should be raised.
2. Variable Not in
varsListIf a
permutation.varreferences a variable not present in the annotation'svarslist, this should be treated as a validation error.3. Undeclared Annotation Property
If
permutation.annotationPropertiescontains a property not declared in the pattern'sannotationPropertiesdictionary, this should be treated as a validation error.4. Duplicate Values
If the same string value appears in multiple annotation properties (e.g., a term has "cardiac" as both exact_synonym and related_synonym), the implementation MAY deduplicate to avoid generating identical annotations.
5. No Permutations Specified
If
permutationsis omitted or is an empty array, behavior is identical to current DOSDP - only label-based generation occurs.6. Permutation for Subset of Variables
It is valid to specify permutations for only some of the variables in
vars. Variables without permutation entries use only their label.Backwards Compatibility
This feature is fully backwards compatible:
permutationsfield is optionalpermutationsbehave exactly as before