Skip to content

Conversation

@matentzn
Copy link

Overview

The permutations feature extends DOSDP's annotation generation capabilities by allowing pattern authors to specify that additional annotations should be generated using values from annotation properties on filler terms, rather than only using the term's label.

Motivation

When generating synonyms from DOSDP patterns, the current approach substitutes the label of filler terms into the annotation text. However, filler terms often have multiple synonyms themselves, and ideally the generated term should have synonyms that incorporate these alternative names.

Example: If we have a pattern for "acute {disease}" and the disease filler is "heart disease" which has synonyms "cardiac disease" and "heart condition", we want to generate:

  • "acute heart disease" (from label - always generated)
  • "acute cardiac disease" (from synonym permutation)
  • "acute heart condition" (from synonym permutation)

Schema Definition

New permutation Definition

permutation:
  type: object
  additionalProperties: False
  required: [var, annotationProperties]
  properties:
    var:
      description: >
        The name of a single variable for which to generate permutations.
        Must correspond to a variable specified in the 'vars' field of the pattern.
      type: string
    annotationProperties:
      description: >
        A list of annotation property names (as declared in the annotationProperties
        dictionary) whose values from the filler term will be used to generate
        additional annotations. Each value found generates a separate annotation.
      type: array
      items: { type: string }

Addition to printf_annotation and printf_annotation_obo

permutations:
  description: >
    Optional list of permutation specifications. For each variable specified,
    generates additional annotations using the values of the specified annotation
    properties from the filler term. The label-based annotation is always generated
    in addition to the permutations. If multiple permutation entries exist for
    different variables, the cartesian product of all values is generated.
  type: array
  items: { $ref: '#/definitions/permutation' }

Complete Example Pattern

pattern_name: acute_with_permutations

pattern_iri: http://purl.obolibrary.org/obo/mondo/patterns/acute_with_permutations.yaml

description: >
  This pattern is applied to diseases that are described as having an acute onset.
  It demonstrates the permutations feature for synonym generation.

contributors:
- https://orcid.org/0000-0002-6601-2165

classes:
  acute: PATO:0000389
  disease: MONDO:0000001

relations:
  has modifier: RO:0002573

annotationProperties:
  exact_synonym: oio:hasExactSynonym
  related_synonym: oio:hasRelatedSynonym

vars:
  disease: '''disease'''

name:
  text: acute %s
  vars:
  - disease

annotations:
- annotationProperty: exact_synonym
  text: '%s, acute'
  permutations:
  - var: disease
    annotationProperties:
    - exact_synonym
  vars:
  - disease

def:
  text: Acute form of %s.
  vars:
  - disease

equivalentTo:
  text: '%s and ''has modifier'' some ''acute'''
  vars:
  - disease

Semantic Behavior

Core Rules

  1. Label-based generation is ALWAYS performed first. The permutations feature is purely additive - it never replaces the standard label-based annotation generation.

  2. The var field is a single string value, not an array. To specify permutations for multiple variables, add multiple entries to the permutations array.

  3. All values from all specified annotation properties are used. If annotationProperties lists [exact_synonym, related_synonym], values from both properties are collected and used for permutations.

  4. Cartesian product for multiple vars. If multiple permutation entries exist for different variables, the implementation must generate the cartesian product of all value combinations.

Detailed Example with Expected Output

Input Pattern:

annotations:
- annotationProperty: exact_synonym
  text: '%s %s'
  permutations:
  - var: quality
    annotationProperties:
    - exact_synonym
  - var: entity
    annotationProperties:
    - exact_synonym
    - related_synonym
  vars:
  - quality
  - entity

Filler Term Data:

  • quality bound to term with:
    • label: "enlarged"
    • exact_synonym: "big"
    • exact_synonym: "hypertrophic"
  • entity bound to term with:
    • label: "heart"
    • exact_synonym: "cardiac organ"
    • related_synonym: "pump"

Expected Generated Annotations:

First, the label-based annotation (always generated):

  1. "enlarged heart" (label × label)

Then, permutations for quality (using label for entity):
2. "big heart" (exact_synonym × label)
3. "hypertrophic heart" (exact_synonym × label)

Then, permutations for entity (using label for quality):
4. "enlarged cardiac organ" (label × exact_synonym)
5. "enlarged pump" (label × related_synonym)

Then, cartesian product of both permutations:
6. "big cardiac organ" (exact_synonym × exact_synonym)
7. "big pump" (exact_synonym × related_synonym)
8. "hypertrophic cardiac organ" (exact_synonym × exact_synonym)
9. "hypertrophic pump" (exact_synonym × related_synonym)

Total: 9 annotations (1 label-based + 8 permutation-based)

Formula for Number of Annotations

For a pattern with n variables where variable i has:

  • 1 label (always)
  • s_i synonym values from the specified annotation properties (0 if no permutation specified for that var)

The total number of annotations generated is:

∏(1 + s_i) for i in 1..n

Where the product includes all variables in the vars list.

Edge Cases and Special Handling

1. Empty Annotation Property Values

If a filler term has no values for any of the specified annotation properties, only the label-based annotation is generated. No error should be raised.

2. Variable Not in vars List

If a permutation.var references a variable not present in the annotation's vars list, this should be treated as a validation error.

3. Undeclared Annotation Property

If permutation.annotationProperties contains a property not declared in the pattern's annotationProperties dictionary, this should be treated as a validation error.

4. Duplicate Values

If the same string value appears in multiple annotation properties (e.g., a term has "cardiac" as both exact_synonym and related_synonym), the implementation MAY deduplicate to avoid generating identical annotations.

5. No Permutations Specified

If permutations is omitted or is an empty array, behavior is identical to current DOSDP - only label-based generation occurs.

6. Permutation for Subset of Variables

It is valid to specify permutations for only some of the variables in vars. Variables without permutation entries use only their label.

Backwards Compatibility

This feature is fully backwards compatible:

  • The permutations field is optional
  • Patterns without permutations behave exactly as before
  • No changes to existing pattern syntax or semantics

Introduces the permutation feature for synonyms on labels and definitions. This is entirely vibe coded, and not reviewed at all. I have no idea how to read it.

I tested it with OBA and it works!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants