Skip to content

MFB-62: readability check#1337

Open
SonaliBedge wants to merge 15 commits intomainfrom
sonali/mfb-readability-check
Open

MFB-62: readability check#1337
SonaliBedge wants to merge 15 commits intomainfrom
sonali/mfb-readability-check

Conversation

@SonaliBedge
Copy link
Collaborator

@SonaliBedge SonaliBedge commented Jan 13, 2026

Context & Motivation

  • Fixes : #MFB-62

  • Related PR: None

  • Old PR : #1855

    • Implements translation readability checking to ensure content is accessible to users with varying reading levels
    • Goal: English content at or below 8th grade reading level, Spanish content scoring 60+ on Fernández-Huerta

Changes Made

  • New management command: python manage.py check_readability to analyze translation text readability

  • Readability metrics:
    English: Flesch-Kincaid Grade Level (target ≤ 8.0)
    Spanish: Fernández-Huerta score (target ≥ 60)

  • Add readability command output files to .gitignore

    Commands:

    Check all English translations

    python manage.py check_readability --language en-us

    Check English for specific white-label (Colorado)

    python manage.py check_readability --language en --whitelabel co

    Check Spanish translations

    python manage.py check_readability --language es --whitelabel co

    With custom threshold (6th grade level)

    python manage.py check_readability --language en --whitelabel co --threshold 6

    Show detailed scores for each translation

    python manage.py check_readability --language en --whitelabel co --detailed

    CI/CD mode - exit with error code if failures found

    python manage.py check_readability --language en --whitelabel co --fail-on-error

    Also show passing translations

    python manage.py check_readability --language en --whitelabel co --show-passing

    Command options:

    [--language] / -l: Language code (en-us, es)
    [--whitelabel] / -w: Filter by white label (e.g., co, ma, nc)
    [--threshold] / -t: Custom threshold override
    [--detailed]: Show all readability scores
    [--show-passing]: Include passing translations in output
    --fail-on-error: Exit with error code for CI/CD integration
    --min-words: Minimum word count for analysis (default: 10)
    [--label-filter]: Filter by label prefix
    New dependency: Added [textstat>=0.7.3] to [requirements.txt]
    Unit tests: Added comprehensive tests in [test_check_readability.py]
    Documentation: Updated README with usage examples

Testing

  • Migrations to run: None

  • Configuration updates needed: None

  • Environment variables/settings to add: None

  • Manual testing steps:

    Install dependencies

    pip install -r requirements.txt

    Download NLTK data (first time only)

    python -c "import nltk; nltk.download('cmudict')"

    Run readability check

    python manage.py check_readability --language en --whitelabel co

    Run unit tests

    pytest translations/management/commands/tests/test_check_readability.py -v

Deployment

  • Post-deployment scripts needed: None
  • Production config updates: None
  • Admin updates needed: None
  • Notify team/users of: check_readability command available for analyzing translation content

Notes for Reviewers

  • Known limitations:

    Very short texts (<10 words) are skipped as readability metrics are unreliable for them
    NLTK's cmudict data must be downloaded for accurate syllable counting

  • Future considerations:

    Add GitHub Actions workflow (.github/workflows/readability.yml) for automated PR checks
    Consider adding pre-commit hook for local validation
    May want to adjust thresholds after initial content review period

Summary by CodeRabbit

  • New Features

    • Adds a readability check for translations (English & Spanish) with language-aware metrics, thresholds, filtering, reporting, and optional fail-on-error.
  • Documentation

    • Adds a "Readability Check" section describing workflow, metrics, CLI options, thresholds, and export formats.
  • Tests

    • Adds comprehensive tests for readability analysis, thresholds, CLI options, filtering, outputs, and failure behavior.
  • Chores

    • Adds runtime dependency: textstat>=0.7.3.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'base_branches'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

Adds a README section describing a readability check, a new dependency (textstat), a Django management command check_readability implementing English/Spanish readability analysis and reporting, and unit tests for the checker and command. No existing public APIs modified.

Changes

Cohort / File(s) Summary
Documentation
README.md
Appended a "Readability Check" section with workflow, metrics (English: Flesch‑Kincaid; Spanish: Fernández‑Huerta), CLI usage, flags, and examples.
Dependencies
requirements.txt
Added dependency: textstat>=0.7.3.
Readability command implementation
translations/management/commands/check_readability.py
New Django management command adding ReadabilityResult (dataclass), ReadabilityChecker (English/Spanish analysis, word counting, thresholds, min-word bypass), and Command with CLI options, whitelabel filtering, formatted summary, and JSON/CSV/text export helpers.
Tests
translations/management/commands/tests/__init__.py, translations/management/commands/tests/test_check_readability.py
Added tests for ReadabilityChecker and command behavior (language handling, thresholds, short/empty texts, whitelabel filtering, --fail-on-error, output assertions).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'MFB-62: readability check' directly corresponds to the main objective of implementing a readability checking feature for translations.
Docstring Coverage ✅ Passed Docstring coverage is 91.18% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The pull request description comprehensively covers all required template sections with concrete details about motivation, changes, testing steps, and deployment considerations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sonali/mfb-readability-check
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

@codecov
Copy link

codecov bot commented Jan 13, 2026

Codecov Report

❌ Patch coverage is 83.89423% with 67 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...nslations/management/commands/check_readability.py 74.23% 51 Missing and 16 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @translations/management/commands/check_readability.py:
- Around line 261-273: The translation_fields list in
translations/management/commands/check_readability.py is out of sync with the
Program model: remove the non-existent fields name_short and category, replace
value_description with the correct Program field name(s) (use estimated_value
and/or value_type as defined on Program), and add the missing fields
estimated_value and website_description; update the translation_fields array
accordingly and verify the final names exactly match the Program model's field
names.
🧹 Nitpick comments (6)
translations/management/commands/check_readability.py (3)

180-184: Consider passing min_words to the constructor instead of mutating the class attribute.

Setting checker.MIN_WORD_COUNT = min_words on the instance works but is unconventional. Passing it as a constructor parameter would be cleaner.

Suggested improvement
 class ReadabilityChecker:
-    def __init__(self, en_threshold: Optional[float] = None, es_threshold: Optional[float] = None):
+    def __init__(self, en_threshold: Optional[float] = None, es_threshold: Optional[float] = None, min_word_count: Optional[int] = None):
         self.en_threshold = en_threshold if en_threshold is not None else self.ENGLISH_THRESHOLD
         self.es_threshold = es_threshold if es_threshold is not None else self.SPANISH_THRESHOLD
+        self.min_word_count = min_word_count if min_word_count is not None else self.MIN_WORD_COUNT

Then in check(), use self.min_word_count instead of self.MIN_WORD_COUNT.


305-310: Remove extraneous f prefix from strings without placeholders.

Lines 306 and 309 use f-strings but contain no placeholders.

Suggested fix
         if language.startswith("es"):
-            self.stdout.write(f"Metric: Fernández-Huerta (higher is better)")
+            self.stdout.write("Metric: Fernández-Huerta (higher is better)")
             self.stdout.write(f"Threshold: >= {checker.es_threshold}")
         else:
-            self.stdout.write(f"Metric: Flesch-Kincaid Grade Level (lower is better)")
+            self.stdout.write("Metric: Flesch-Kincaid Grade Level (lower is better)")
             self.stdout.write(f"Threshold: <= {checker.en_threshold} (8th grade)")

355-358: Remove unused style variable.

The style variable is assigned but never used.

Suggested fix
     def _print_result(self, result: ReadabilityResult, detailed: bool, is_failing: bool):
         """Print details for a single translation result."""
-        style = self.style.ERROR if is_failing else self.style.SUCCESS
         icon = "❌" if is_failing else "✅"
translations/management/commands/tests/test_check_readability.py (2)

159-164: Use _ for unused lambda parameter.

The self parameter in the lambda is unused. Using _ is the Python convention for ignored parameters and silences the static analysis warning.

Suggested fix (apply to all similar occurrences)
-        mock_queryset.__iter__ = lambda self: iter([])
+        mock_queryset.__iter__ = lambda _: iter([])

Same change applies to lines 178, 194, 224, and 251.


155-166: Consider extracting mock queryset setup to reduce duplication.

The mock queryset setup pattern is repeated 5 times across the test class. A helper method or pytest fixture would improve maintainability.

Example helper method
def _create_mock_queryset(self, translations=None):
    """Create a mock queryset for Translation.objects."""
    mock_queryset = MagicMock()
    mock_queryset.filter.return_value = mock_queryset
    mock_queryset.prefetch_related.return_value = mock_queryset
    translations = translations or []
    mock_queryset.count.return_value = len(translations)
    mock_queryset.__iter__ = lambda _: iter(translations)
    return mock_queryset
requirements.txt (1)

95-95: Use exact version pinning to match other dependencies.

All other dependencies use == for exact version pinning, but textstat uses >=0.7.3. This inconsistency could result in non-reproducible builds if textstat releases a breaking change.

Suggested fix
-textstat>=0.7.3
+textstat==0.7.3
📜 Review details

Configuration used: Repository: MyFriendBen/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e3a590 and fff14c8.

📒 Files selected for processing (5)
  • README.md
  • requirements.txt
  • translations/management/commands/check_readability.py
  • translations/management/commands/tests/__init__.py
  • translations/management/commands/tests/test_check_readability.py
🧰 Additional context used
🧬 Code graph analysis (2)
translations/management/commands/tests/test_check_readability.py (1)
translations/management/commands/check_readability.py (5)
  • ReadabilityResult (26-36)
  • analyze_english (61-71)
  • analyze_spanish (73-80)
  • check (88-136)
  • get_word_count (82-86)
translations/management/commands/check_readability.py (3)
translations/models.py (1)
  • Translation (167-304)
programs/models.py (1)
  • Program (597-773)
screener/models.py (1)
  • WhiteLabel (12-19)
🪛 Ruff (0.14.10)
translations/management/commands/tests/test_check_readability.py

163-163: Unused lambda argument: self

(ARG005)


178-178: Unused lambda argument: self

(ARG005)


194-194: Unused lambda argument: self

(ARG005)


224-224: Unused lambda argument: self

(ARG005)


251-251: Unused lambda argument: self

(ARG005)

translations/management/commands/check_readability.py

169-169: Unused method argument: args

(ARG002)


196-196: Avoid specifying long messages outside the exception class

(TRY003)


239-239: Avoid specifying long messages outside the exception class

(TRY003)


306-306: f-string without any placeholders

Remove extraneous f prefix

(F541)


309-309: f-string without any placeholders

Remove extraneous f prefix

(F541)


357-357: Local variable style is assigned to but never used

Remove assignment to unused variable style

(F841)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Tests with Coverage
🔇 Additional comments (8)
translations/management/commands/tests/__init__.py (1)

1-1: LGTM!

Standard package initializer for the test module.

translations/management/commands/check_readability.py (4)

1-23: LGTM!

Well-documented module with clear usage examples in the docstring. Imports are properly organized.


25-37: LGTM!

Clean dataclass design with appropriate type annotations for all fields.


61-80: Thread safety concern with textstat.set_lang().

textstat.set_lang() modifies global state. If this command is ever run concurrently (e.g., in a multi-threaded context or parallel test execution), there could be race conditions where one thread's language setting affects another's analysis.

For a Django management command run sequentially, this is acceptable, but worth noting for future parallel usage.


142-168: LGTM!

CLI arguments are well-defined with clear help text and sensible defaults.

translations/management/commands/tests/test_check_readability.py (2)

22-145: LGTM!

Comprehensive test coverage for ReadabilityChecker including edge cases for empty, None, and short texts. Tests are well-structured and clearly named.


260-282: LGTM!

Adequate test coverage for the ReadabilityResult dataclass.

README.md (1)

34-65: LGTM!

Clear and comprehensive documentation for the new readability check feature. The examples cover all major use cases and the explanation of metrics is helpful.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @translations/management/commands/check_readability.py:
- Around line 115-126: The check method currently falls back to English for any
non-Spanish language; add explicit handling for Vietnamese by checking if
lang.startswith("vi") before the else branch, and when detected write a clear
warning (e.g., via self.stdout.write or self.stderr.write) that Vietnamese is
unsupported and skip analysis (set scores/primary_score/threshold to None or a
sentinel and mark passes as False/skip), leaving analyze_spanish and
analyze_english unchanged; ensure any code that consumes primary_score/threshold
handles the sentinel, and update the command help text or docs to reflect that
vi is unsupported if you prefer not to analyze it.
🧹 Nitpick comments (4)
translations/management/commands/check_readability.py (4)

180-184: Mutating class constant via instance attribute assignment is confusing.

Line 184 assigns to checker.MIN_WORD_COUNT, which shadows the class-level constant with an instance attribute. While this works, it's an unusual pattern that may confuse future maintainers.

♻️ Suggested refactor: Accept min_words as a constructor parameter
-    def __init__(self, en_threshold: Optional[float] = None, es_threshold: Optional[float] = None):
+    def __init__(
+        self,
+        en_threshold: Optional[float] = None,
+        es_threshold: Optional[float] = None,
+        min_word_count: Optional[int] = None,
+    ):
         self.en_threshold = en_threshold if en_threshold is not None else self.ENGLISH_THRESHOLD
         self.es_threshold = es_threshold if es_threshold is not None else self.SPANISH_THRESHOLD
+        self.min_word_count = min_word_count if min_word_count is not None else self.MIN_WORD_COUNT

Then use self.min_word_count instead of self.MIN_WORD_COUNT in the check method, and update the Command:

         checker = ReadabilityChecker(
             en_threshold=threshold if language.startswith("en") else None,
             es_threshold=threshold if language.startswith("es") else None,
+            min_word_count=min_words,
         )
-        checker.MIN_WORD_COUNT = min_words

169-177: Consider validating the language option against supported languages.

The language option accepts any string, but only en-* and es-* are meaningfully supported. Invalid language codes could lead to unexpected behavior (e.g., translation.set_current_language() with an unsupported code).

♻️ Suggested validation
+        SUPPORTED_LANGUAGES = {"en-us", "es", "vi"}
         language = options["language"]
+        if language not in SUPPORTED_LANGUAGES and not any(language.startswith(p) for p in ("en", "es")):
+            self.stderr.write(self.style.WARNING(
+                f"Warning: '{language}' may not be fully supported. Supported: en-us, es, vi"
+            ))

306-310: Remove extraneous f prefix from strings without placeholders.

Lines 306 and 309 use f-strings but have no placeholders.

♻️ Proposed fix
         if language.startswith("es"):
-            self.stdout.write(f"Metric: Fernández-Huerta (higher is better)")
+            self.stdout.write("Metric: Fernández-Huerta (higher is better)")
             self.stdout.write(f"Threshold: >= {checker.es_threshold}")
         else:
-            self.stdout.write(f"Metric: Flesch-Kincaid Grade Level (lower is better)")
+            self.stdout.write("Metric: Flesch-Kincaid Grade Level (lower is better)")
             self.stdout.write(f"Threshold: <= {checker.en_threshold} (8th grade)")

355-358: Remove unused style variable.

The style variable is assigned but never used.

♻️ Proposed fix
     def _print_result(self, result: ReadabilityResult, detailed: bool, is_failing: bool):
         """Print details for a single translation result."""
-        style = self.style.ERROR if is_failing else self.style.SUCCESS
         icon = "❌" if is_failing else "✅"
📜 Review details

Configuration used: Repository: MyFriendBen/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fff14c8 and dea94a1.

📒 Files selected for processing (1)
  • translations/management/commands/check_readability.py
🧰 Additional context used
🧬 Code graph analysis (1)
translations/management/commands/check_readability.py (3)
translations/models.py (1)
  • Translation (167-304)
programs/models.py (1)
  • Program (597-773)
screener/models.py (1)
  • WhiteLabel (12-19)
🪛 Ruff (0.14.10)
translations/management/commands/check_readability.py

169-169: Unused method argument: args

(ARG002)


196-196: Avoid specifying long messages outside the exception class

(TRY003)


239-239: Avoid specifying long messages outside the exception class

(TRY003)


306-306: f-string without any placeholders

Remove extraneous f prefix

(F541)


309-309: f-string without any placeholders

Remove extraneous f prefix

(F541)


357-357: Local variable style is assigned to but never used

Remove assignment to unused variable style

(F841)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Tests with Coverage
🔇 Additional comments (5)
translations/management/commands/check_readability.py (5)

25-37: LGTM!

The dataclass is well-structured with appropriate fields for storing readability analysis results.


61-80: Global state in textstat.set_lang() is fine for sequential command execution.

The textstat.set_lang() call modifies global state. This works correctly for a sequential management command, but if this ReadabilityChecker class is ever used in a concurrent context (e.g., async views, threads), the language setting could be overwritten between set_lang() and the metric calls. Consider documenting this limitation or restructuring if concurrent use is planned.


241-281: LGTM!

The white label lookup with fallback from code to name is well-implemented. The translation field list correctly matches the Program model's translation ForeignKey fields based on the relevant code snippets.


314-323: Division by zero is guarded, but consider simplifying the ternary.

The percentage calculation is protected by if total > 0, but the nested ternary structure is slightly hard to follow. This is fine as-is, just noting for future readability.


344-351: Sorting logic for failing translations is correct.

English failures are sorted worst-first (highest grade level), and Spanish failures are sorted worst-first (lowest Fernández-Huerta score). This makes sense for prioritizing which translations need attention first.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
translations/management/commands/check_readability.py (2)

220-226: Consider handling missing language translations gracefully.

set_current_language() may not raise an exception, but accessing translation.text afterward could return None or the fallback language's text depending on Parler's configuration. The current if not text check on line 223 handles None, but you may want to verify that the translation actually exists for the requested language to avoid silently analyzing fallback text.

Optional: Explicit language check
             translation.set_current_language(language)
-            text = translation.text
+            
+            # Check if translation exists for this specific language
+            if not translation.has_translation(language):
+                skipped_count += 1
+                continue
+                
+            text = translation.text

263-287: Consider using select_related to reduce database queries.

The current implementation makes N+1 queries when iterating over programs - one query per translation field access. Using select_related can fetch all translation references in a single query.

Proposed optimization
         # Get all programs for this white label
-        programs = Program.objects.filter(white_label=wl)
+        programs = Program.objects.filter(white_label=wl).select_related(
+            "name",
+            "description",
+            "description_short",
+            "learn_more_link",
+            "apply_button_link",
+            "apply_button_description",
+            "value_type",
+            "estimated_delivery_time",
+            "estimated_application_time",
+            "estimated_value",
+            "website_description",
+        )
📜 Review details

Configuration used: Repository: MyFriendBen/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dea94a1 and a50e17f.

📒 Files selected for processing (1)
  • translations/management/commands/check_readability.py
🧰 Additional context used
🧬 Code graph analysis (1)
translations/management/commands/check_readability.py (3)
translations/models.py (1)
  • Translation (167-304)
programs/models.py (1)
  • Program (597-773)
screener/models.py (1)
  • WhiteLabel (12-19)
🪛 Ruff (0.14.10)
translations/management/commands/check_readability.py

175-175: Unused method argument: args

(ARG002)


202-202: Avoid specifying long messages outside the exception class

(TRY003)


245-245: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Tests with Coverage
🔇 Additional comments (4)
translations/management/commands/check_readability.py (4)

1-23: LGTM!

Clear module docstring with usage examples, and appropriate imports for the functionality.


25-37: LGTM!

Clean dataclass definition with appropriate type hints for storing readability analysis results.


148-173: LGTM!

Well-defined CLI arguments with clear help text and sensible defaults.


289-377: LGTM!

Well-structured output formatting with appropriate handling of edge cases (zero total) and correct sorting logic for different language thresholds.

@SonaliBedge
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@SonaliBedge SonaliBedge requested a review from kbrey January 14, 2026 19:01
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@translations/management/commands/check_readability.py`:
- Around line 563-564: The passing-results sort in _print_summary currently
always sorts ascending by r.primary_score; update it to mirror _export_text and
_export_json by sorting with reverse=language.startswith("es") so Spanish
results are shown highest-first; locate the loop that iterates over passing (the
call to self._print_result within _print_summary) and change the sorted(...)
call to use key=lambda r: r.primary_score and reverse=language.startswith("es")
so ordering is consistent with _export_text/_export_json and _print_result.
- Line 344: In _export_report() remove the unused local variable total
(assignment "total = len(passing) + len(failing)") since it is never referenced;
update the function in translations/management/commands/check_readability.py by
deleting that line to eliminate the dead variable and any linter warnings.
🧹 Nitpick comments (2)
translations/management/commands/check_readability.py (2)

126-137: Unsupported languages still fall through to English analysis.

While the past review flagged this and the PR description was updated to remove Vietnamese, the code still uses an else branch that will apply English metrics to any non-Spanish language. If a user accidentally passes an unsupported language like --language vi or --language fr, they'll get misleading English readability scores.

Consider adding input validation in the handle() method to reject unsupported languages early:

SUPPORTED_LANGUAGES = {"en-us", "en", "es"}
if not any(language.startswith(prefix) for prefix in ("en", "es")):
    raise CommandError(f"Unsupported language: {language}. Supported: en-us, es")

245-266: Minor redundancy in word count checking.

The word count is checked twice: once inside checker.check() (line 114) which marks short texts as passing with empty scores, and again here (line 259) to skip them entirely. This works correctly but the logic is slightly scattered.

Consider either:

  1. Having check() return a distinct status for "skipped" vs "passing", or
  2. Checking word count only in the loop before calling check()

This is a minor observation—the current approach is functional.

@catonph catonph changed the title Sonali/mfb readability check MFB-62: readability check Feb 17, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
translations/management/commands/check_readability.py (1)

127-137: ⚠️ Potential issue | 🟠 Major

Reject unsupported --language values instead of silently falling back to English.

At Line 132, any non-es language is analyzed with English metrics, while custom threshold wiring at Line 217-Line 218 only applies for en*/es*. A typo like --language en_us or fr yields misleading scoring behavior instead of a clear error.

Suggested fix
@@
-        if lang.startswith("es"):
+        if lang.startswith("es"):
             scores = self.analyze_spanish(text)
             primary_score = scores["fernandez_huerta"]
             passes = primary_score >= self.es_threshold
             threshold = self.es_threshold
-        else:
+        elif lang.startswith("en"):
             scores = self.analyze_english(text)
             primary_score = scores["flesch_kincaid_grade"]
             passes = primary_score <= self.en_threshold
             threshold = self.en_threshold
+        else:
+            raise ValueError(f"Unsupported language for readability analysis: {lang}")
@@
-        language = options["language"]
+        language = options["language"].lower()
+        if not (language.startswith("en") or language.startswith("es")):
+            raise CommandError(
+                f"Unsupported --language '{language}'. Supported prefixes are 'en' and 'es'."
+            )

Also applies to: 155-160, 217-219

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@translations/management/commands/check_readability.py` around lines 127 -
137, The command currently treats any non-"es" language as English; update the
language validation in the check_readability command to explicitly accept only
languages starting with "en" or "es" and reject others with a clear error/exit
instead of falling back to English; locate the language handling where
analyze_spanish and analyze_english are invoked (references: analyze_spanish,
analyze_english, es_threshold, en_threshold) and replace the implicit else with
an explicit check (if lang.startswith("es") ... elif lang.startswith("en") ...
else -> raise CommandError/print error and exit), and apply the same strict
validation logic to the other language-handling blocks noted around the other
occurrences (the blocks referencing primary_score/en_threshold and
fernandez_huerta/es_threshold) so typos like "en_us" or unsupported codes
produce an error rather than silent fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@translations/management/commands/check_readability.py`:
- Around line 225-230: The current white-label logic fetches all active
Translation rows and then filters in Python, causing N+1 FK hits; update the
initial queryset (the variable `translations`) to apply the white-label filter
in the database (e.g., add .filter(program__label__in=allowed_labels) or the
appropriate program field) before prefetch_related so only matching translations
are loaded, and modify `_get_whitelabel_translation_labels` to use
select_related('program') (or include the program join in its queryset) so you
don’t dereference program FKs per-row; ensure you keep
prefetch_related("translations") as needed but move filtering into the ORM to
avoid iteration-based filtering.

---

Duplicate comments:
In `@translations/management/commands/check_readability.py`:
- Around line 127-137: The command currently treats any non-"es" language as
English; update the language validation in the check_readability command to
explicitly accept only languages starting with "en" or "es" and reject others
with a clear error/exit instead of falling back to English; locate the language
handling where analyze_spanish and analyze_english are invoked (references:
analyze_spanish, analyze_english, es_threshold, en_threshold) and replace the
implicit else with an explicit check (if lang.startswith("es") ... elif
lang.startswith("en") ... else -> raise CommandError/print error and exit), and
apply the same strict validation logic to the other language-handling blocks
noted around the other occurrences (the blocks referencing
primary_score/en_threshold and fernandez_huerta/es_threshold) so typos like
"en_us" or unsupported codes produce an error rather than silent fallback.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: MyFriendBen/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d62295b8-f892-4181-9c2b-3c7efc1d6590

📥 Commits

Reviewing files that changed from the base of the PR and between 2254360 and 870897d.

📒 Files selected for processing (2)
  • requirements.txt
  • translations/management/commands/check_readability.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • requirements.txt

Copy link
Collaborator

@kbrey kbrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SonaliBedge Nice work, here are some suggestions for you, let me know what you think!

```

For detailed information about writing and maintaining integration tests, see [docs/INTEGRATION_TESTING.md](docs/INTEGRATION_TESTING.md).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for thinking about documentation! I think our README is pretty high level so instead you can add this command to the commands wiki.

.gitignore Outdated
.claude/projects/
.claude/settings.local.json

# Readability check output files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to output these files in a folder, such as readability-output/ (or tmp/), so that they don't collide with potential future files and so that this isn't fragile to file name changes.

@SonaliBedge SonaliBedge requested a review from kbrey March 13, 2026 17:44
Copy link
Collaborator

@kbrey kbrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SonaliBedge great cleanups! A couple followup suggestions, let me know what you think.

self.stdout.write(self.style.SUCCESS("PASSING TRANSLATIONS:"))
self.stdout.write("-" * 60)
for line in lines:
if line.startswith("✅"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this drier! But this approach seems a bit fragile to have conditionals based on the icons; if we ever remove those icons from the text this would silently break. What would you think about a small ReportData dataclass that captures the computed state (sorted results, counts, threshold, metric name) and a _build_report_data method that populates it. Each output method (_print_summary, _export_text, _export_json, _export_csv) then renders from that directly.


result = checker.check(translation.label, text, language)

if result.word_count < min_words:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding right, we check for minimum word count twice here, both in checker.check and laso here within _analyze. Seems like we could clean that up by checking in just one of those places?

@SonaliBedge SonaliBedge requested a review from kbrey March 17, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants