Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
Note
|
| Cohort / File(s) | Summary |
|---|---|
Documentation README.md |
Appended a "Readability Check" section with workflow, metrics (English: Flesch‑Kincaid; Spanish: Fernández‑Huerta), CLI usage, flags, and examples. |
Dependencies requirements.txt |
Added dependency: textstat>=0.7.3. |
Readability command implementation translations/management/commands/check_readability.py |
New Django management command adding ReadabilityResult (dataclass), ReadabilityChecker (English/Spanish analysis, word counting, thresholds, min-word bypass), and Command with CLI options, whitelabel filtering, formatted summary, and JSON/CSV/text export helpers. |
Tests translations/management/commands/tests/__init__.py, translations/management/commands/tests/test_check_readability.py |
Added tests for ReadabilityChecker and command behavior (language handling, thresholds, short/empty texts, whitelabel filtering, --fail-on-error, output assertions). |
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title 'MFB-62: readability check' directly corresponds to the main objective of implementing a readability checking feature for translations. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 91.18% which is sufficient. The required threshold is 80.00%. |
| Description check | ✅ Passed | The pull request description comprehensively covers all required template sections with concrete details about motivation, changes, testing steps, and deployment considerations. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Commit unit tests in branch
sonali/mfb-readability-check
📝 Coding Plan
- Generate coding plan for human review comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Tip
You can enable review details to help with troubleshooting, context usage and more.
Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @translations/management/commands/check_readability.py:
- Around line 261-273: The translation_fields list in
translations/management/commands/check_readability.py is out of sync with the
Program model: remove the non-existent fields name_short and category, replace
value_description with the correct Program field name(s) (use estimated_value
and/or value_type as defined on Program), and add the missing fields
estimated_value and website_description; update the translation_fields array
accordingly and verify the final names exactly match the Program model's field
names.
🧹 Nitpick comments (6)
translations/management/commands/check_readability.py (3)
180-184: Consider passingmin_wordsto the constructor instead of mutating the class attribute.Setting
checker.MIN_WORD_COUNT = min_wordson the instance works but is unconventional. Passing it as a constructor parameter would be cleaner.Suggested improvement
class ReadabilityChecker: - def __init__(self, en_threshold: Optional[float] = None, es_threshold: Optional[float] = None): + def __init__(self, en_threshold: Optional[float] = None, es_threshold: Optional[float] = None, min_word_count: Optional[int] = None): self.en_threshold = en_threshold if en_threshold is not None else self.ENGLISH_THRESHOLD self.es_threshold = es_threshold if es_threshold is not None else self.SPANISH_THRESHOLD + self.min_word_count = min_word_count if min_word_count is not None else self.MIN_WORD_COUNTThen in
check(), useself.min_word_countinstead ofself.MIN_WORD_COUNT.
305-310: Remove extraneousfprefix from strings without placeholders.Lines 306 and 309 use f-strings but contain no placeholders.
Suggested fix
if language.startswith("es"): - self.stdout.write(f"Metric: Fernández-Huerta (higher is better)") + self.stdout.write("Metric: Fernández-Huerta (higher is better)") self.stdout.write(f"Threshold: >= {checker.es_threshold}") else: - self.stdout.write(f"Metric: Flesch-Kincaid Grade Level (lower is better)") + self.stdout.write("Metric: Flesch-Kincaid Grade Level (lower is better)") self.stdout.write(f"Threshold: <= {checker.en_threshold} (8th grade)")
355-358: Remove unusedstylevariable.The
stylevariable is assigned but never used.Suggested fix
def _print_result(self, result: ReadabilityResult, detailed: bool, is_failing: bool): """Print details for a single translation result.""" - style = self.style.ERROR if is_failing else self.style.SUCCESS icon = "❌" if is_failing else "✅"translations/management/commands/tests/test_check_readability.py (2)
159-164: Use_for unused lambda parameter.The
selfparameter in the lambda is unused. Using_is the Python convention for ignored parameters and silences the static analysis warning.Suggested fix (apply to all similar occurrences)
- mock_queryset.__iter__ = lambda self: iter([]) + mock_queryset.__iter__ = lambda _: iter([])Same change applies to lines 178, 194, 224, and 251.
155-166: Consider extracting mock queryset setup to reduce duplication.The mock queryset setup pattern is repeated 5 times across the test class. A helper method or pytest fixture would improve maintainability.
Example helper method
def _create_mock_queryset(self, translations=None): """Create a mock queryset for Translation.objects.""" mock_queryset = MagicMock() mock_queryset.filter.return_value = mock_queryset mock_queryset.prefetch_related.return_value = mock_queryset translations = translations or [] mock_queryset.count.return_value = len(translations) mock_queryset.__iter__ = lambda _: iter(translations) return mock_querysetrequirements.txt (1)
95-95: Use exact version pinning to match other dependencies.All other dependencies use
==for exact version pinning, buttextstatuses>=0.7.3. This inconsistency could result in non-reproducible builds if textstat releases a breaking change.Suggested fix
-textstat>=0.7.3 +textstat==0.7.3
📜 Review details
Configuration used: Repository: MyFriendBen/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
README.mdrequirements.txttranslations/management/commands/check_readability.pytranslations/management/commands/tests/__init__.pytranslations/management/commands/tests/test_check_readability.py
🧰 Additional context used
🧬 Code graph analysis (2)
translations/management/commands/tests/test_check_readability.py (1)
translations/management/commands/check_readability.py (5)
ReadabilityResult(26-36)analyze_english(61-71)analyze_spanish(73-80)check(88-136)get_word_count(82-86)
translations/management/commands/check_readability.py (3)
translations/models.py (1)
Translation(167-304)programs/models.py (1)
Program(597-773)screener/models.py (1)
WhiteLabel(12-19)
🪛 Ruff (0.14.10)
translations/management/commands/tests/test_check_readability.py
163-163: Unused lambda argument: self
(ARG005)
178-178: Unused lambda argument: self
(ARG005)
194-194: Unused lambda argument: self
(ARG005)
224-224: Unused lambda argument: self
(ARG005)
251-251: Unused lambda argument: self
(ARG005)
translations/management/commands/check_readability.py
169-169: Unused method argument: args
(ARG002)
196-196: Avoid specifying long messages outside the exception class
(TRY003)
239-239: Avoid specifying long messages outside the exception class
(TRY003)
306-306: f-string without any placeholders
Remove extraneous f prefix
(F541)
309-309: f-string without any placeholders
Remove extraneous f prefix
(F541)
357-357: Local variable style is assigned to but never used
Remove assignment to unused variable style
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Tests with Coverage
🔇 Additional comments (8)
translations/management/commands/tests/__init__.py (1)
1-1: LGTM!Standard package initializer for the test module.
translations/management/commands/check_readability.py (4)
1-23: LGTM!Well-documented module with clear usage examples in the docstring. Imports are properly organized.
25-37: LGTM!Clean dataclass design with appropriate type annotations for all fields.
61-80: Thread safety concern withtextstat.set_lang().
textstat.set_lang()modifies global state. If this command is ever run concurrently (e.g., in a multi-threaded context or parallel test execution), there could be race conditions where one thread's language setting affects another's analysis.For a Django management command run sequentially, this is acceptable, but worth noting for future parallel usage.
142-168: LGTM!CLI arguments are well-defined with clear help text and sensible defaults.
translations/management/commands/tests/test_check_readability.py (2)
22-145: LGTM!Comprehensive test coverage for
ReadabilityCheckerincluding edge cases for empty, None, and short texts. Tests are well-structured and clearly named.
260-282: LGTM!Adequate test coverage for the
ReadabilityResultdataclass.README.md (1)
34-65: LGTM!Clear and comprehensive documentation for the new readability check feature. The examples cover all major use cases and the explanation of metrics is helpful.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @translations/management/commands/check_readability.py:
- Around line 115-126: The check method currently falls back to English for any
non-Spanish language; add explicit handling for Vietnamese by checking if
lang.startswith("vi") before the else branch, and when detected write a clear
warning (e.g., via self.stdout.write or self.stderr.write) that Vietnamese is
unsupported and skip analysis (set scores/primary_score/threshold to None or a
sentinel and mark passes as False/skip), leaving analyze_spanish and
analyze_english unchanged; ensure any code that consumes primary_score/threshold
handles the sentinel, and update the command help text or docs to reflect that
vi is unsupported if you prefer not to analyze it.
🧹 Nitpick comments (4)
translations/management/commands/check_readability.py (4)
180-184: Mutating class constant via instance attribute assignment is confusing.Line 184 assigns to
checker.MIN_WORD_COUNT, which shadows the class-level constant with an instance attribute. While this works, it's an unusual pattern that may confuse future maintainers.♻️ Suggested refactor: Accept min_words as a constructor parameter
- def __init__(self, en_threshold: Optional[float] = None, es_threshold: Optional[float] = None): + def __init__( + self, + en_threshold: Optional[float] = None, + es_threshold: Optional[float] = None, + min_word_count: Optional[int] = None, + ): self.en_threshold = en_threshold if en_threshold is not None else self.ENGLISH_THRESHOLD self.es_threshold = es_threshold if es_threshold is not None else self.SPANISH_THRESHOLD + self.min_word_count = min_word_count if min_word_count is not None else self.MIN_WORD_COUNTThen use
self.min_word_countinstead ofself.MIN_WORD_COUNTin thecheckmethod, and update the Command:checker = ReadabilityChecker( en_threshold=threshold if language.startswith("en") else None, es_threshold=threshold if language.startswith("es") else None, + min_word_count=min_words, ) - checker.MIN_WORD_COUNT = min_words
169-177: Consider validating the language option against supported languages.The
languageoption accepts any string, but onlyen-*andes-*are meaningfully supported. Invalid language codes could lead to unexpected behavior (e.g.,translation.set_current_language()with an unsupported code).♻️ Suggested validation
+ SUPPORTED_LANGUAGES = {"en-us", "es", "vi"} language = options["language"] + if language not in SUPPORTED_LANGUAGES and not any(language.startswith(p) for p in ("en", "es")): + self.stderr.write(self.style.WARNING( + f"Warning: '{language}' may not be fully supported. Supported: en-us, es, vi" + ))
306-310: Remove extraneousfprefix from strings without placeholders.Lines 306 and 309 use f-strings but have no placeholders.
♻️ Proposed fix
if language.startswith("es"): - self.stdout.write(f"Metric: Fernández-Huerta (higher is better)") + self.stdout.write("Metric: Fernández-Huerta (higher is better)") self.stdout.write(f"Threshold: >= {checker.es_threshold}") else: - self.stdout.write(f"Metric: Flesch-Kincaid Grade Level (lower is better)") + self.stdout.write("Metric: Flesch-Kincaid Grade Level (lower is better)") self.stdout.write(f"Threshold: <= {checker.en_threshold} (8th grade)")
355-358: Remove unusedstylevariable.The
stylevariable is assigned but never used.♻️ Proposed fix
def _print_result(self, result: ReadabilityResult, detailed: bool, is_failing: bool): """Print details for a single translation result.""" - style = self.style.ERROR if is_failing else self.style.SUCCESS icon = "❌" if is_failing else "✅"
📜 Review details
Configuration used: Repository: MyFriendBen/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
translations/management/commands/check_readability.py
🧰 Additional context used
🧬 Code graph analysis (1)
translations/management/commands/check_readability.py (3)
translations/models.py (1)
Translation(167-304)programs/models.py (1)
Program(597-773)screener/models.py (1)
WhiteLabel(12-19)
🪛 Ruff (0.14.10)
translations/management/commands/check_readability.py
169-169: Unused method argument: args
(ARG002)
196-196: Avoid specifying long messages outside the exception class
(TRY003)
239-239: Avoid specifying long messages outside the exception class
(TRY003)
306-306: f-string without any placeholders
Remove extraneous f prefix
(F541)
309-309: f-string without any placeholders
Remove extraneous f prefix
(F541)
357-357: Local variable style is assigned to but never used
Remove assignment to unused variable style
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Tests with Coverage
🔇 Additional comments (5)
translations/management/commands/check_readability.py (5)
25-37: LGTM!The dataclass is well-structured with appropriate fields for storing readability analysis results.
61-80: Global state intextstat.set_lang()is fine for sequential command execution.The
textstat.set_lang()call modifies global state. This works correctly for a sequential management command, but if thisReadabilityCheckerclass is ever used in a concurrent context (e.g., async views, threads), the language setting could be overwritten betweenset_lang()and the metric calls. Consider documenting this limitation or restructuring if concurrent use is planned.
241-281: LGTM!The white label lookup with fallback from code to name is well-implemented. The translation field list correctly matches the Program model's translation ForeignKey fields based on the relevant code snippets.
314-323: Division by zero is guarded, but consider simplifying the ternary.The percentage calculation is protected by
if total > 0, but the nested ternary structure is slightly hard to follow. This is fine as-is, just noting for future readability.
344-351: Sorting logic for failing translations is correct.English failures are sorted worst-first (highest grade level), and Spanish failures are sorted worst-first (lowest Fernández-Huerta score). This makes sense for prioritizing which translations need attention first.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
translations/management/commands/check_readability.py (2)
220-226: Consider handling missing language translations gracefully.
set_current_language()may not raise an exception, but accessingtranslation.textafterward could returnNoneor the fallback language's text depending on Parler's configuration. The currentif not textcheck on line 223 handlesNone, but you may want to verify that the translation actually exists for the requested language to avoid silently analyzing fallback text.Optional: Explicit language check
translation.set_current_language(language) - text = translation.text + + # Check if translation exists for this specific language + if not translation.has_translation(language): + skipped_count += 1 + continue + + text = translation.text
263-287: Consider usingselect_relatedto reduce database queries.The current implementation makes N+1 queries when iterating over programs - one query per translation field access. Using
select_relatedcan fetch all translation references in a single query.Proposed optimization
# Get all programs for this white label - programs = Program.objects.filter(white_label=wl) + programs = Program.objects.filter(white_label=wl).select_related( + "name", + "description", + "description_short", + "learn_more_link", + "apply_button_link", + "apply_button_description", + "value_type", + "estimated_delivery_time", + "estimated_application_time", + "estimated_value", + "website_description", + )
📜 Review details
Configuration used: Repository: MyFriendBen/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
translations/management/commands/check_readability.py
🧰 Additional context used
🧬 Code graph analysis (1)
translations/management/commands/check_readability.py (3)
translations/models.py (1)
Translation(167-304)programs/models.py (1)
Program(597-773)screener/models.py (1)
WhiteLabel(12-19)
🪛 Ruff (0.14.10)
translations/management/commands/check_readability.py
175-175: Unused method argument: args
(ARG002)
202-202: Avoid specifying long messages outside the exception class
(TRY003)
245-245: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Tests with Coverage
🔇 Additional comments (4)
translations/management/commands/check_readability.py (4)
1-23: LGTM!Clear module docstring with usage examples, and appropriate imports for the functionality.
25-37: LGTM!Clean dataclass definition with appropriate type hints for storing readability analysis results.
148-173: LGTM!Well-defined CLI arguments with clear help text and sensible defaults.
289-377: LGTM!Well-structured output formatting with appropriate handling of edge cases (zero total) and correct sorting logic for different language thresholds.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
…ores in JSON /CSV/text.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@translations/management/commands/check_readability.py`:
- Around line 563-564: The passing-results sort in _print_summary currently
always sorts ascending by r.primary_score; update it to mirror _export_text and
_export_json by sorting with reverse=language.startswith("es") so Spanish
results are shown highest-first; locate the loop that iterates over passing (the
call to self._print_result within _print_summary) and change the sorted(...)
call to use key=lambda r: r.primary_score and reverse=language.startswith("es")
so ordering is consistent with _export_text/_export_json and _print_result.
- Line 344: In _export_report() remove the unused local variable total
(assignment "total = len(passing) + len(failing)") since it is never referenced;
update the function in translations/management/commands/check_readability.py by
deleting that line to eliminate the dead variable and any linter warnings.
🧹 Nitpick comments (2)
translations/management/commands/check_readability.py (2)
126-137: Unsupported languages still fall through to English analysis.While the past review flagged this and the PR description was updated to remove Vietnamese, the code still uses an
elsebranch that will apply English metrics to any non-Spanish language. If a user accidentally passes an unsupported language like--language vior--language fr, they'll get misleading English readability scores.Consider adding input validation in the
handle()method to reject unsupported languages early:SUPPORTED_LANGUAGES = {"en-us", "en", "es"} if not any(language.startswith(prefix) for prefix in ("en", "es")): raise CommandError(f"Unsupported language: {language}. Supported: en-us, es")
245-266: Minor redundancy in word count checking.The word count is checked twice: once inside
checker.check()(line 114) which marks short texts as passing with empty scores, and again here (line 259) to skip them entirely. This works correctly but the logic is slightly scattered.Consider either:
- Having
check()return a distinct status for "skipped" vs "passing", or- Checking word count only in the loop before calling
check()This is a minor observation—the current approach is functional.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
translations/management/commands/check_readability.py (1)
127-137:⚠️ Potential issue | 🟠 MajorReject unsupported
--languagevalues instead of silently falling back to English.At Line 132, any non-
eslanguage is analyzed with English metrics, while custom threshold wiring at Line 217-Line 218 only applies foren*/es*. A typo like--language en_usorfryields misleading scoring behavior instead of a clear error.Suggested fix
@@ - if lang.startswith("es"): + if lang.startswith("es"): scores = self.analyze_spanish(text) primary_score = scores["fernandez_huerta"] passes = primary_score >= self.es_threshold threshold = self.es_threshold - else: + elif lang.startswith("en"): scores = self.analyze_english(text) primary_score = scores["flesch_kincaid_grade"] passes = primary_score <= self.en_threshold threshold = self.en_threshold + else: + raise ValueError(f"Unsupported language for readability analysis: {lang}") @@ - language = options["language"] + language = options["language"].lower() + if not (language.startswith("en") or language.startswith("es")): + raise CommandError( + f"Unsupported --language '{language}'. Supported prefixes are 'en' and 'es'." + )Also applies to: 155-160, 217-219
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@translations/management/commands/check_readability.py` around lines 127 - 137, The command currently treats any non-"es" language as English; update the language validation in the check_readability command to explicitly accept only languages starting with "en" or "es" and reject others with a clear error/exit instead of falling back to English; locate the language handling where analyze_spanish and analyze_english are invoked (references: analyze_spanish, analyze_english, es_threshold, en_threshold) and replace the implicit else with an explicit check (if lang.startswith("es") ... elif lang.startswith("en") ... else -> raise CommandError/print error and exit), and apply the same strict validation logic to the other language-handling blocks noted around the other occurrences (the blocks referencing primary_score/en_threshold and fernandez_huerta/es_threshold) so typos like "en_us" or unsupported codes produce an error rather than silent fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@translations/management/commands/check_readability.py`:
- Around line 225-230: The current white-label logic fetches all active
Translation rows and then filters in Python, causing N+1 FK hits; update the
initial queryset (the variable `translations`) to apply the white-label filter
in the database (e.g., add .filter(program__label__in=allowed_labels) or the
appropriate program field) before prefetch_related so only matching translations
are loaded, and modify `_get_whitelabel_translation_labels` to use
select_related('program') (or include the program join in its queryset) so you
don’t dereference program FKs per-row; ensure you keep
prefetch_related("translations") as needed but move filtering into the ORM to
avoid iteration-based filtering.
---
Duplicate comments:
In `@translations/management/commands/check_readability.py`:
- Around line 127-137: The command currently treats any non-"es" language as
English; update the language validation in the check_readability command to
explicitly accept only languages starting with "en" or "es" and reject others
with a clear error/exit instead of falling back to English; locate the language
handling where analyze_spanish and analyze_english are invoked (references:
analyze_spanish, analyze_english, es_threshold, en_threshold) and replace the
implicit else with an explicit check (if lang.startswith("es") ... elif
lang.startswith("en") ... else -> raise CommandError/print error and exit), and
apply the same strict validation logic to the other language-handling blocks
noted around the other occurrences (the blocks referencing
primary_score/en_threshold and fernandez_huerta/es_threshold) so typos like
"en_us" or unsupported codes produce an error rather than silent fallback.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: MyFriendBen/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d62295b8-f892-4181-9c2b-3c7efc1d6590
📒 Files selected for processing (2)
requirements.txttranslations/management/commands/check_readability.py
🚧 Files skipped from review as they are similar to previous changes (1)
- requirements.txt
kbrey
left a comment
There was a problem hiding this comment.
@SonaliBedge Nice work, here are some suggestions for you, let me know what you think!
| ``` | ||
|
|
||
| For detailed information about writing and maintaining integration tests, see [docs/INTEGRATION_TESTING.md](docs/INTEGRATION_TESTING.md). | ||
|
|
There was a problem hiding this comment.
Thanks for thinking about documentation! I think our README is pretty high level so instead you can add this command to the commands wiki.
.gitignore
Outdated
| .claude/projects/ | ||
| .claude/settings.local.json | ||
|
|
||
| # Readability check output files |
There was a problem hiding this comment.
I would suggest to output these files in a folder, such as readability-output/ (or tmp/), so that they don't collide with potential future files and so that this isn't fragile to file name changes.
kbrey
left a comment
There was a problem hiding this comment.
@SonaliBedge great cleanups! A couple followup suggestions, let me know what you think.
| self.stdout.write(self.style.SUCCESS("PASSING TRANSLATIONS:")) | ||
| self.stdout.write("-" * 60) | ||
| for line in lines: | ||
| if line.startswith("✅"): |
There was a problem hiding this comment.
Thanks for making this drier! But this approach seems a bit fragile to have conditionals based on the icons; if we ever remove those icons from the text this would silently break. What would you think about a small ReportData dataclass that captures the computed state (sorted results, counts, threshold, metric name) and a _build_report_data method that populates it. Each output method (_print_summary, _export_text, _export_json, _export_csv) then renders from that directly.
|
|
||
| result = checker.check(translation.label, text, language) | ||
|
|
||
| if result.word_count < min_words: |
There was a problem hiding this comment.
If I'm understanding right, we check for minimum word count twice here, both in checker.check and laso here within _analyze. Seems like we could clean that up by checking in just one of those places?
Context & Motivation
Fixes : #MFB-62
Related PR: None
Old PR : #1855
Changes Made
New management command: python manage.py check_readability to analyze translation text readability
Readability metrics:
English: Flesch-Kincaid Grade Level (target ≤ 8.0)
Spanish: Fernández-Huerta score (target ≥ 60)
Add readability command output files to .gitignore
Commands:
Check all English translations
python manage.py check_readability --language en-usCheck English for specific white-label (Colorado)
python manage.py check_readability --language en --whitelabel coCheck Spanish translations
python manage.py check_readability --language es --whitelabel coWith custom threshold (6th grade level)
python manage.py check_readability --language en --whitelabel co --threshold 6Show detailed scores for each translation
python manage.py check_readability --language en --whitelabel co --detailedCI/CD mode - exit with error code if failures found
python manage.py check_readability --language en --whitelabel co --fail-on-errorAlso show passing translations
python manage.py check_readability --language en --whitelabel co --show-passingCommand options:
[--language] / -l: Language code (en-us, es)
[--whitelabel] / -w: Filter by white label (e.g., co, ma, nc)
[--threshold] / -t: Custom threshold override
[--detailed]: Show all readability scores
[--show-passing]: Include passing translations in output
--fail-on-error: Exit with error code for CI/CD integration
--min-words: Minimum word count for analysis (default: 10)
[--label-filter]: Filter by label prefix
New dependency: Added [textstat>=0.7.3] to [requirements.txt]
Unit tests: Added comprehensive tests in [test_check_readability.py]
Documentation: Updated README with usage examples
Testing
Migrations to run: None
Configuration updates needed: None
Environment variables/settings to add: None
Manual testing steps:
Install dependencies
pip install -r requirements.txtDownload NLTK data (first time only)
python -c "import nltk; nltk.download('cmudict')"Run readability check
python manage.py check_readability --language en --whitelabel coRun unit tests
pytest translations/management/commands/tests/test_check_readability.py -vDeployment
Notes for Reviewers
Known limitations:
Very short texts (<10 words) are skipped as readability metrics are unreliable for them
NLTK's cmudict data must be downloaded for accurate syllable counting
Future considerations:
Add GitHub Actions workflow (.github/workflows/readability.yml) for automated PR checks
Consider adding pre-commit hook for local validation
May want to adjust thresholds after initial content review period
Summary by CodeRabbit
New Features
Documentation
Tests
Chores