Skip to content

TEI Loading Enhancement & Config-Based Corpus Creation#88

Merged
TheoMoins merged 10 commits intomasterfrom
parseTEI-recovered
May 23, 2025
Merged

TEI Loading Enhancement & Config-Based Corpus Creation#88
TheoMoins merged 10 commits intomasterfrom
parseTEI-recovered

Conversation

@TheoMoins
Copy link
Copy Markdown
Contributor

This PR introduces two major improvements to SuperStyl:

1. Enhanced TEI Loading Functionality

  • Added support for metric annotations with two new feature types:
    • met_line: Captures verse-level metric patterns
    • met_syll: Captures syllable-level metric patterns

2. Configuration-Based Corpus Loading

Addresses Issue #77:

  • Implemented a new loading mechanism that accepts JSON configuration files
  • This allows defining multiple feature sets in a single configuration
  • Can be accessed via the load_corpus_from_config function or the --json CLI option

Example Configuration:

{
  "paths": "../corpus/02-chretiendetroyes_final.xml",
  "format": "tei",
  "sampling": {
    "enabled": true,
    "units": "verses",
    "sample_size": 1000,
    "sample_step": 500,
    "max_samples": 100,
    "sample_random": false
  },
  
  "features": [
    {
      "type": "words",
      "n": 1,
      "k": 1000,
      "freq_type": "relative",
      "culling": 0
    },
    {
      "type": "chars",
      "n": 3,
      "k": 2000,
      "freq_type": "relative",
      "culling": 5
    },
    {
      "type": "affixes",
      "n": 3,
      "k": 1000,
      "freq_type": "relative"
    },
    {
      "type": "met_line",
      "n": 1,
      "k": 500,
      "freq_type": "relative"
    },
    {
      "type": "met_syll",
      "n": 13,
      "k": 500,
      "freq_type": "relative"
    }
  ]
}

All functionality remains backward compatible with existing API calls and examples.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 27, 2025

Codecov Report

Attention: Patch coverage is 77.41935% with 21 lines in your changes missing coverage. Please review.

Project coverage is 69.82%. Comparing base (8066bda) to head (d680b55).
Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
superstyl/preproc/pipe.py 38.46% 8 Missing ⚠️
superstyl/preproc/features_extract.py 22.22% 7 Missing ⚠️
superstyl/load_from_config.py 91.30% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #88      +/-   ##
==========================================
+ Coverage   66.54%   69.82%   +3.28%     
==========================================
  Files           8        9       +1     
  Lines         550      633      +83     
==========================================
+ Hits          366      442      +76     
- Misses        184      191       +7     
Flag Coverage Δ
unittests 69.82% <77.41%> (+3.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TheoMoins TheoMoins merged commit b318237 into master May 23, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants