ADFA-3911 | Fix widget tag parsing and centralize OCR correction rules by jatezzz · Pull Request #1310 · appdevforall/CodeOnTheGo

jatezzz · 2026-05-14T20:21:59Z

Description

Fixed an issue where Button widget values were being mixed up with Switch widget values due to improper parsing of separator characters. Additionally, centralized and expanded OCR sanitization rules to better handle common misreads (such as "bsld" or "text stjle" to "text_style") to improve overall attribute parsing accuracy.

Details

WidgetTagParser.kt: Added the dot (.) character to the separator regex group in tagExtractRegex to correctly parse and isolate widget tags, preventing Button/Switch collision.
OcrSanitizationRules.kt: Created a new dedicated file containing a list of OcrCorrectionRule to centralize regex patterns and replacements for colors, dimensions, margins, and text attributes.
FuzzyAttributeParser.kt: Refactored tokenizeAnnotation to fold over the new preTokenizationRules list instead of using hardcoded chained string replacements.
MarginAnnotationParser.kt: Updated margin parsing logic to safely concatenate overlapping right margin annotations using a pipe (|) delimiter instead of overwriting existing map keys.

Screen.Recording.2026-05-14.at.11.56.39.AM.mov

Ticket

ADFA-3911

Observation

Moving the OCR sanitization rules into their own file (OcrSanitizationRules.kt) significantly cleans up the FuzzyAttributeParser and makes it much easier to scale and maintain new OCR correction patterns as more edge cases are discovered in the future.

coderabbitai · 2026-05-14T20:27:45Z

📝 Walkthrough

Release Notes

Features & Improvements

Widget Tag Parsing Fix: Updated WidgetTagParser regex to include dot (.) character in separator patterns, fixing incorrect parsing that caused Button widget values to be mixed with Switch widget values
Centralized OCR Correction Rules: Introduced new OcrSanitizerRules.kt with dedicated sanitizer classes for common OCR misreads:
- ColorSanitizer: Normalizes background color variants
- TextAttributeSanitizer: Fixes text style attribute misreads (e.g., "text stjle" → "text_style")
- DimensionSanitizer: Corrects width/height/match_parent OCR errors
- MarginPaddingSanitizer: Normalizes margin and padding direction phrases
- StructureSanitizer: Fixes horizontal gravity center layout patterns
Streamlined OCR Sanitization: Refactored FuzzyAttributeParser to delegate OCR cleanup to centralized OcrSanitizer infrastructure instead of hardcoded replacements
Improved Margin Annotation Handling: MarginAnnotationParser now processes all margin data in a single global pass and concatenates overlapping right margin annotations using pipe (|) delimiter to preserve all margin data

Technical Changes

Added OcrSanitizer interface and DictionaryRegexSanitizer abstract base class for composable OCR rules
Implemented CompositeOcrSanitizer to apply multiple sanitizers sequentially
Added OcrSanitizerFactory to create and manage default sanitizer instances
Comprehensive test coverage in OcrSanitizerRulesTest validating all new sanitizer behaviors

⚠️ Risks & Breaking Changes

Data Format Change: Overlapping margin annotations now concatenate with " | " delimiter instead of overwriting values. Downstream consumers must handle multi-value margin strings or update parsing logic
Regex Pattern Modification: Widget tag parsing regex changes may affect edge cases with dot-containing widget identifiers—recommend testing against existing widget datasets
OCR Correction Ordering: Sequential application of sanitizers matters; changes to sanitizer ordering or rules in OcrSanitizerFactory could alter output unpredictably
Increased Complexity: New sanitizer infrastructure adds multiple abstraction layers; ensure maintainability is prioritized as rules grow

Walkthrough

Adds a composable OCR sanitizer framework and rules, integrates the default sanitizer into FuzzyAttributeParser, tightens WidgetTagParser tag extraction/validation, and refactors MarginAnnotationParser to resolve and merge margin annotations in a single global pass.

Changes

OCR parsing infrastructure and sanitization framework

Layer / File(s)	Summary
OCR sanitizer framework setup `cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizer.kt`, `CompositeOcrSanitizer.kt`, `OcrSanitizerFactory.kt`, `OcrSanitizerRules.kt`	OcrSanitizer interface and DictionaryRegexSanitizer base class establish regex-replacement sanitizers. CompositeOcrSanitizer composes multiple sanitizers. OcrSanitizerFactory builds a default composite from ColorSanitizer, TextAttributeSanitizer, DimensionSanitizer, MarginPaddingSanitizer, and StructureSanitizer.
FuzzyAttributeParser sanitizer integration `cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/FuzzyAttributeParser.kt`	FuzzyAttributeParser obtains a default sanitizer from OcrSanitizerFactory and applies sanitizer.sanitize(annotation) before tokenization, replacing inline regex cleanup.
WidgetTagParser extraction and validation `cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`	tagExtractRegex now captures prefix, separator, and token; parseTagParts returns nullable parsed parts and validates prefixes against VALID_PREFIXES. normalizeTagText and extractTag use parsed parts with uppercase fallback.
MarginAnnotationParser unified margin handling `cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`	parse now delegates to parseMarginsGlobally which pools explicit blocks from both margins, resolves implicit blocks against the shared canvasTags pool, and merges annotations; duplicate widget values are joined with "
Sanitizer rules tests `cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt`	Unit tests validate each sanitizer class and a test confirms the default composite sanitizer applies multiple rules in order on combined input.

Sequence Diagram(s)

sequenceDiagram
  participant FA as FuzzyAttributeParser
  participant OSF as OcrSanitizerFactory
  participant COS as CompositeOcrSanitizer
  participant CS as ColorSanitizer
  participant TS as TextAttributeSanitizer
  participant DS as DimensionSanitizer
  participant MPS as MarginPaddingSanitizer
  participant SS as StructureSanitizer
  FA->>OSF: createDefaultSanitizer()
  OSF->>COS: new CompositeOcrSanitizer([CS, TS, DS, MPS, SS])
  COS-->>FA: sanitizer instance
  FA->>COS: sanitize(annotation)
  COS->>CS: sanitize(annotation)
  CS-->>COS: color-normalized
  COS->>TS: sanitize(previous)
  TS-->>COS: text-style-normalized
  COS->>DS: sanitize(previous)
  DS-->>COS: dimension-normalized
  COS->>MPS: sanitize(previous)
  MPS-->>COS: margin-padding-normalized
  COS->>SS: sanitize(previous)
  SS-->>COS: structure-normalized
  COS-->>FA: fully sanitized annotation

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

appdevforall/CodeOnTheGo#1261: Related changes to MarginAnnotationParser's explicit-block assignment and side handling.
appdevforall/CodeOnTheGo#1233: Related to FuzzyAttributeParser OCR normalization/refactoring.
appdevforall/CodeOnTheGo#1047: Prior changes that modified MarginAnnotationParser's parse flow.

Suggested reviewers

avestaadfa
Daniel-ADFA

"I'm a rabbit with a tidy mind,
I hop through OCR crumbs I find.
I stitch punctuation, color, size,
and join the margins, neat and wise.
Hooray — clean tags and saner lines!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 19.05% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the two main changes: widget tag parsing fix and centralized OCR correction rules.
Description check	✅ Passed	The description provides detailed explanation of all major changes across four files, including the specific issues fixed and the rationale for centralization.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/ADFA-3911-widget-tag-parsing-and-ocr-rules-experimental

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt (1)
5-15: ⚡ Quick win

Consider documenting the sanitizer ordering rationale.

The order in which sanitizers are applied may affect the final output if rules overlap. Consider adding a comment explaining why this specific sequence (Color → TextAttribute → Dimension → MarginPadding → Structure) was chosen.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt`
around lines 5 - 15, Add a brief comment above createDefaultSanitizer explaining
the rationale for the sanitizer application order: why ColorSanitizer runs
before TextAttributeSanitizer, then DimensionSanitizer, MarginPaddingSanitizer,
and finally StructureSanitizer; reference CompositeOcrSanitizer and the list of
sanitizers (ColorSanitizer, TextAttributeSanitizer, DimensionSanitizer,
MarginPaddingSanitizer, StructureSanitizer) and describe any dependencies or
side-effects that justify the sequence (e.g., color normalization first so later
attribute merging/structure decisions operate on normalized values).
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt (2)
8-10: ⚡ Quick win

Avoid duplicating the valid-prefix list between tagRegex and VALID_PREFIXES.

tagRegex (line 8) and VALID_PREFIXES (line 10) both encode the same prefix set (B|P|D|T|C|R|SW|S). A future addition (e.g., a new widget prefix) is easy to apply to one and forget the other, silently desynchronizing extraction validation from matches(tagRegex) checks. Consider deriving tagRegex from VALID_PREFIXES so they stay in lockstep.
♻️ Proposed derivation
-    private val tagRegex = Regex("^(?i)(B|P|D|T|C|R|SW|S)-[A-Z0-9_]+$")
-    private val tagExtractRegex = Regex("^(?i)([A-Z0-9\\s]+)([\\s\\-_.]+)([A-Z0-9_\\-]+)")
-    private val VALID_PREFIXES = setOf("B", "P", "D", "T", "C", "R", "SW", "S")
+    private val VALID_PREFIXES = setOf("B", "P", "D", "T", "C", "R", "SW", "S")
+    private val tagRegex = Regex(
+        "^(?i)(${VALID_PREFIXES.joinToString("|")})-[A-Z0-9_]+$"
+    )
+    private val tagExtractRegex = Regex("^(?i)([A-Z0-9\\s]+)([\\s\\-_.]+)([A-Z0-9_\\-]+)")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`
around lines 8 - 10, The prefix list is duplicated between VALID_PREFIXES and
the hardcoded tagRegex; update the implementation so tagRegex is constructed
from VALID_PREFIXES instead of repeating the literals: build a regex alternation
by joining VALID_PREFIXES (sort by length desc to ensure multi-char prefixes
like "SW" match before "S"), escape any regex-special chars if needed, prepend
the case-insensitive flag (?i) and then compile that into tagRegex; keep
tagExtractRegex unchanged but ensure any logic that relies on the prefix set
uses VALID_PREFIXES (or the dynamically-built regex) so the two stay in sync.
75-79: 💤 Low value

isValidTagMatch is now effectively dead-code-protected.

With the updated tagExtractRegex, group 2 is [\s\-_.]+ (one or more), so separator.isEmpty() can never be true when the regex matches. The guard therefore always returns true, and the three call sites (isTag, normalizeTagText, extractTag) perform a no-op check. Either tighten the validation to do something meaningful (e.g., reject single-character ambiguous separators that aren't - _ .), or drop the helper to reduce noise.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`
around lines 75 - 79, The current isValidTagMatch guard is useless because group
2 is always non-empty; update isValidTagMatch to reject ambiguous
single-character separators (e.g., a single whitespace or other punctuation)
while allowing "-", "_" and ".". Concretely, in isValidTagMatch(match:
MatchResult) inspect val separator = match.groupValues[2] and val rawToken =
match.groupValues[3] and return false when separator.length == 1 && separator
!in listOf("-", "_", ".") && rawToken.firstOrNull()?.isLetter() == true;
otherwise return true. This keeps the three call sites (isTag, normalizeTagText,
extractTag) meaningful by filtering out tokens that start with a letter preceded
by an ambiguous one-char separator.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 27-33: The merge is allowing the same canvas tag to be assigned by
both parseMarginGroup(distribution.leftMargin, canvasTags) and
parseMarginGroup(distribution.rightMargin, canvasTags) because splitCanvasTags
was removed and each resolveImplicitBlocks maintains independent
unresolvedTagsByPrefix; fix by reintroducing shared consumption or re-splitting:
either (A) restore splitCanvasTags and pass side-specific tag lists into
parseMarginGroup so left/right only see their half, or (B) keep the unified
canvasTags but add a shared consumedTags set that both parseMarginGroup calls
consult/mark and have resolveImplicitBlocks check against (and against
existingAnnotations across both margins) before popping a tag; update
annotationMap merging logic only if you intentionally want multi-side
annotations.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRules.kt`:
- Around line 17-23: Add unit tests that exercise the OCR sanitizer
implementations (e.g., DimensionSanitizer, ColorSanitizer and any other classes
extending DictionaryRegexSanitizer) to assert that their rawRules produce the
expected normalized strings for common OCR misreads; specifically, create tests
that instantiate each sanitizer (or call the public normalization method on
DictionaryRegexSanitizer if available), feed representative malformed inputs
like "ilayout wioidth", "layout hei5ht.", "mwa tch-parent" and assert outputs
equal "layout_width:", "layout_height:", "match_parent" (and analogous expected
results for ColorSanitizer), and include negative/edge cases to ensure rules
don't overreplace. Ensure tests reference the class names DimensionSanitizer and
ColorSanitizer (or other sanitizer class names) so failures point directly to
the right implementation.

---

Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt`:
- Around line 5-15: Add a brief comment above createDefaultSanitizer explaining
the rationale for the sanitizer application order: why ColorSanitizer runs
before TextAttributeSanitizer, then DimensionSanitizer, MarginPaddingSanitizer,
and finally StructureSanitizer; reference CompositeOcrSanitizer and the list of
sanitizers (ColorSanitizer, TextAttributeSanitizer, DimensionSanitizer,
MarginPaddingSanitizer, StructureSanitizer) and describe any dependencies or
side-effects that justify the sequence (e.g., color normalization first so later
attribute merging/structure decisions operate on normalized values).

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`:
- Around line 8-10: The prefix list is duplicated between VALID_PREFIXES and the
hardcoded tagRegex; update the implementation so tagRegex is constructed from
VALID_PREFIXES instead of repeating the literals: build a regex alternation by
joining VALID_PREFIXES (sort by length desc to ensure multi-char prefixes like
"SW" match before "S"), escape any regex-special chars if needed, prepend the
case-insensitive flag (?i) and then compile that into tagRegex; keep
tagExtractRegex unchanged but ensure any logic that relies on the prefix set
uses VALID_PREFIXES (or the dynamically-built regex) so the two stay in sync.
- Around line 75-79: The current isValidTagMatch guard is useless because group
2 is always non-empty; update isValidTagMatch to reject ambiguous
single-character separators (e.g., a single whitespace or other punctuation)
while allowing "-", "_" and ".". Concretely, in isValidTagMatch(match:
MatchResult) inspect val separator = match.groupValues[2] and val rawToken =
match.groupValues[3] and return false when separator.length == 1 && separator
!in listOf("-", "_", ".") && rawToken.firstOrNull()?.isLetter() == true;
otherwise return true. This keeps the three call sites (isTag, normalizeTagText,
extractTag) meaningful by filtering out tokens that start with a letter preceded
by an ambiguous one-char separator.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c1d0639a-5974-4ad6-96df-f9c177f786ae

📥 Commits

Reviewing files that changed from the base of the PR and between 7736433 and 832593e.

📒 Files selected for processing (7)

cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/FuzzyAttributeParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/CompositeOcrSanitizer.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizer.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRules.kt

Addresses mixed up Button/Switch tags, improves margin parsing, and refactors OCR corrections.

…les-experimental

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt (1)

8-67: ⚡ Quick win

Consider adding edge case tests.

The current tests verify the main sanitization paths well. Consider adding tests for:

Empty/blank strings (verify graceful handling)
Already-correct input (verify sanitizers don't break valid data)
Idempotency (verify sanitizer.sanitize(sanitizer.sanitize(x)) == sanitizer.sanitize(x))

These would increase confidence that sanitizers handle unexpected input safely and won't corrupt valid data.

📋 Example edge case tests

`@Test`
fun `sanitizers handle empty input gracefully`() {
    val sanitizers = listOf(
        ColorSanitizer(),
        TextAttributeSanitizer(),
        DimensionSanitizer()
    )
    
    sanitizers.forEach { sanitizer ->
        assertEquals("", sanitizer.sanitize(""))
    }
}

`@Test`
fun `sanitizers preserve already-correct input`() {
    val sanitizer = OcrSanitizerFactory.createDefaultSanitizer()
    val validInput = "background red | text_style: bold | layout_width: match_parent"
    
    assertEquals(validInput, sanitizer.sanitize(validInput))
}

`@Test`
fun `sanitizers are idempotent`() {
    val sanitizer = OcrSanitizerFactory.createDefaultSanitizer()
    val input = "backgroundired | text stjle"
    
    val firstPass = sanitizer.sanitize(input)
    val secondPass = sanitizer.sanitize(firstPass)
    
    assertEquals(firstPass, secondPass)
}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt`
around lines 8 - 67, Add edge-case unit tests to the existing
OcrSanitizerRulesTest: verify empty/blank input handling by calling sanitize("")
on ColorSanitizer, TextAttributeSanitizer, DimensionSanitizer (and others) and
expecting "" (or the defined safe output); add a test that passes an
already-correct string to
OcrSanitizerFactory.createDefaultSanitizer().sanitize(...) and asserts it
remains unchanged; add an idempotency test that runs sanitizer.sanitize(input)
twice and asserts the first and second results are equal using the default
sanitizer; use the existing classes/methods (ColorSanitizer,
TextAttributeSanitizer, DimensionSanitizer, MarginPaddingSanitizer,
StructureSanitizer, OcrSanitizerFactory.createDefaultSanitizer(),
sanitizer.sanitize) so tests live alongside the current cases.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt`:
- Line 16: Fix inconsistent indentation in OcrSanitizerRulesTest.kt by replacing
tab characters with 4 space characters for the annotated test lines;
specifically update the `@Test` annotation and surrounding lines where tabs are
used (lines containing the `@Test` marker and the subsequent test method
declarations in OcrSanitizerRulesTest) so they use spaces only—ensure
occurrences around the `@Test` annotation and the related test method
signatures/body (the blocks referenced at lines 16, 24-25, 32-33, 41, and 59 in
the diff) are converted to 4-space indentation to match the rest of the file.

---

Nitpick comments:
In
`@cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt`:
- Around line 8-67: Add edge-case unit tests to the existing
OcrSanitizerRulesTest: verify empty/blank input handling by calling sanitize("")
on ColorSanitizer, TextAttributeSanitizer, DimensionSanitizer (and others) and
expecting "" (or the defined safe output); add a test that passes an
already-correct string to
OcrSanitizerFactory.createDefaultSanitizer().sanitize(...) and asserts it
remains unchanged; add an idempotency test that runs sanitizer.sanitize(input)
twice and asserts the first and second results are equal using the default
sanitizer; use the existing classes/methods (ColorSanitizer,
TextAttributeSanitizer, DimensionSanitizer, MarginPaddingSanitizer,
StructureSanitizer, OcrSanitizerFactory.createDefaultSanitizer(),
sanitizer.sanitize) so tests live alongside the current cases.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e50bdc0b-c4a0-4292-a4c9-2f81aeb70a98

📥 Commits

Reviewing files that changed from the base of the PR and between 832593e and 7ee68a2.

📒 Files selected for processing (8)

cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/FuzzyAttributeParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/CompositeOcrSanitizer.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizer.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRules.kt
cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt

🚧 Files skipped from review as they are similar to previous changes (7)

cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/CompositeOcrSanitizer.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizer.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/FuzzyAttributeParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRules.kt

jatezzz requested review from a team, Daniel-ADFA and avestaadfa May 14, 2026 20:22

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread ...l/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt Outdated

Comment thread ...ava/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRules.kt

jatezzz added 2 commits May 15, 2026 09:09

fix: correct widget tag extraction and centralize OCR sanitization rules

cd27aa8

Addresses mixed up Button/Switch tags, improves margin parsing, and refactors OCR corrections.

fix: cross-margin implicit block collisions and add OCR sanitizer tests

7fc9832

jatezzz force-pushed the fix/ADFA-3911-widget-tag-parsing-and-ocr-rules-experimental branch from 62cb964 to 7fc9832 Compare May 15, 2026 14:10

Daniel-ADFA approved these changes May 15, 2026

View reviewed changes

Merge branch 'stage' into fix/ADFA-3911-widget-tag-parsing-and-ocr-ru…

7ee68a2

…les-experimental

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Comment thread ...org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt

jatezzz merged commit 6b65621 into stage May 15, 2026
2 checks passed

jatezzz deleted the fix/ADFA-3911-widget-tag-parsing-and-ocr-rules-experimental branch May 15, 2026 20:46

coderabbitai Bot mentioned this pull request May 26, 2026

ADFA-4033 | Improve OCR sanitization, value cleaners, and widget support #1333

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ADFA-3911 | Fix widget tag parsing and centralize OCR correction rules#1310

ADFA-3911 | Fix widget tag parsing and centralize OCR correction rules#1310
jatezzz merged 3 commits into
stagefrom
fix/ADFA-3911-widget-tag-parsing-and-ocr-rules-experimental

jatezzz commented May 14, 2026 •

edited by atlassian Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Release Notes

Features & Improvements

Technical Changes

⚠️ Risks & Breaking Changes

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jatezzz commented May 14, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Details

Ticket

Observation

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

Features & Improvements

Technical Changes

⚠️ Risks & Breaking Changes

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jatezzz commented May 14, 2026 •

edited by atlassian Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading