Skip to content

ADFA-3910 | Fix single placeholder dropdown array generation in strings file#1274

Merged
jatezzz merged 3 commits into
stagefrom
fix/ADFA-3910-dropdown-placeholder-string-xml-experimental
May 6, 2026
Merged

ADFA-3910 | Fix single placeholder dropdown array generation in strings file#1274
jatezzz merged 3 commits into
stagefrom
fix/ADFA-3910-dropdown-placeholder-string-xml-experimental

Conversation

@jatezzz
Copy link
Copy Markdown
Collaborator

@jatezzz jatezzz commented May 5, 2026

Description

Fixed an issue where single placeholder labels (e.g., "dropdown", "select", "choose") sketched inside a spinner/dropdown widget incorrectly generated a string-array in strings.xml. Additionally, OCR tag parsing logic has been refactored and centralized into a new WidgetTagParser object.

Details

  • Added isSinglePlaceholderEntry() check in SpinnerWidget to bypass string array creation when the only item is a known placeholder.
  • Extracted and centralized OCR normalization and tag extraction logic from MarginAnnotationParser and WidgetAnnotationMatcher into WidgetTagParser.
  • Improved regex tokenization for splitting entry candidates in AttributeValidator.
document_5174968672900351910.mp4

Ticket

ADFA-3910

Observation

Centralizing the tag parsing logic into WidgetTagParser significantly cleans up the margin and annotation matching domains, ensuring that OCR text normalization (e.g., replacing 'l' with '1', 'O' with '0') is applied consistently across the entire pipeline.

@jatezzz jatezzz requested review from a team and Daniel-ADFA May 5, 2026 18:16
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Release Notes

Features & Improvements

  • Fixed placeholder dropdown handling: spinner/dropdown widgets with only a single placeholder label (e.g., "dropdown", "select", "choose") no longer generate unnecessary string-array entries in strings.xml.
  • Centralized OCR/tag parsing: introduced internal WidgetTagParser to consolidate OCR normalization, tag detection and ordinal extraction previously duplicated across MarginAnnotationParser and WidgetAnnotationMatcher.
  • Better OCR artifact handling: WidgetTagParser and related changes improve normalization of common OCR misreads (examples: l → 1, O → 0, 8 → B, patterns like 8W/S8 → SW).
  • Improved entry tokenization: added OCR-aware tokenization via OcrExtensions.extractOcrEntries(), used by spinner entry parsing and AttributeValidator to better split OCR-derived entry candidates.
  • Spinner parsing robustness: SpinnerWidget/AndroidWidget now filter out non-meaningful or single-placeholder entries and use OCR-driven extraction when deriving entries and resource IDs.
  • Resource name sanitization and ID resolution: AndroidWidget adds sanitizeResourceName utilities and makes resolveWidgetId protected open to allow subclass customization.

Technical Changes

  • New internal object: WidgetTagParser (centralized tag parsing and ordinal extraction).
  • New utility: OcrExtensions.extractOcrEntries() for punctuation-first OCR tokenization with whitespace fallback.
  • MarginAnnotationParser refactored to delegate tag extraction and ordinal handling to WidgetTagParser; block-clustering constants and parsing rules adjusted.
  • WidgetAnnotationMatcher refactored to use WidgetTagParser for normalization, tag matching, and ordinal extraction.
  • AndroidWidget updated to integrate OCR-based entry parsing, placeholder detection, resource-name sanitization, and visibility change for resolveWidgetId.
  • AttributeValidator/EntriesValidator switched to OCR-driven entry extraction instead of naive comma-splitting.
  • Removed duplicate tag/ordinal helpers and local TAG regexes across multiple files in favor of WidgetTagParser.

Risks & Best-Practice Concerns

  • ⚠️ Multi-file refactor risk: Changes touch core CV-to-XML parsing and widget mapping paths—regressions in annotation/tag detection or ordinal handling are possible. Recommend broad integration and regression tests across annotation scenarios.
  • ⚠️ Entry-parsing behavior change: OCR-aware tokenization and placeholder filtering may yield different dropdown entries for malformed OCR text versus prior comma-split logic; validate on representative OCR outputs.
  • ⚠️ Visibility/API surface increase: resolveWidgetId changed from private to protected open—document and review intended overrides to avoid unintended subclass behavior.
  • ⚠️ Implicit block filtering adjusted: MarginAnnotationParser relaxed some canvas-metadata filtering for implicit blocks—confirm this doesn't surface unwanted metadata as blocks.
  • ⚠️ OCR normalization completeness: Centralized mappings must be validated against the OCR engine’s error patterns to avoid incorrect normalizations.
  • ⚠️ Test coverage: No test changes referenced in the summary—add unit/integration tests for WidgetTagParser, OcrExtensions, placeholder detection, and spinner/resource-id derivation.

Migration Notes

  • Generated string resources: spinner/dropdown widgets that previously produced string-array resources for single placeholder labels will no longer do so—this is intended but may change resource outputs and downstream expectations.
  • If custom OCR/tag normalization existed, reconcile it with WidgetTagParser to ensure consistent behavior across the pipeline.

Walkthrough

This PR centralizes widget tag parsing into a new internal WidgetTagParser, updates annotation matching and margin parsing to use it, adds OCR-aware entry extraction utilities, and adjusts Android widget ID resolution and spinner entry handling. Tag/ordinal extraction, metadata filtering, and spinner entry parsing are refactored to use shared utilities.

Changes

Tag Parsing Centralization

Layer / File(s) Summary
Data Shape / New Utility
cv-image-to-xml/src/main/java/.../WidgetTagParser.kt
Adds WidgetTagParser providing isTag, normalizeTagText, extractTag(text): Pair<String,String?>?, and extractOrdinal(tag): Int? with OCR-misread normalization and regex-based validation.
Core Refactor
cv-image-to-xml/src/main/java/.../WidgetAnnotationMatcher.kt
Removes local tag regex/helpers and delegates tag detection, normalization, and ordinal extraction to WidgetTagParser (used in dedup, matching, sorting).
Integration / Parsing Flow
cv-image-to-xml/src/main/java/.../MarginAnnotationParser.kt
Replaces internal tag/ordinal helpers with WidgetTagParser.extractTag/extractOrdinal; adds GAP_MULTIPLIER and HEIGHT_FRACTION; explicit blocks now require non-blank annotationText; implicit blocks relax canvas-metadata filtering; unresolved-tag sorting uses WidgetTagParser.
Tests / Docs (implicit)
(no explicit test/docs changes in diff)
Refactoring concentrated on centralizing parsing logic—no public API signature changes reported.

OCR-Aware Entry Extraction & Widget Enhancements

Layer / File(s) Summary
New OCR Utility
cv-image-to-xml/src/main/java/.../OcrExtensions.kt
Adds internal fun String.extractOcrEntries() that splits by punctuation/newlines and falls back to whitespace tokenization with numeric-like OCR validation.
Grammar/Validation
cv-image-to-xml/src/main/java/.../grammar/AttributeValidator.kt
EntriesValidator.validate now parses entries via extractOcrEntries() instead of comma-splitting; removes a bracket-enclosure helper.
Widget ID & Spinner Handling
cv-image-to-xml/src/main/java/.../xml/AndroidWidget.kt
Makes resolveWidgetId protected open; adds sanitizeResourceName helpers; SpinnerWidget now uses extractOcrEntries(), filters placeholder-only entries, normalizes dropdown labels, and prefers explicit/requested IDs or sanitized-derived IDs before generating new ones.
Tests / Docs (implicit)
(no explicit test/docs changes in diff)
Behavioral changes in spinner parsing and ID derivation; no exported API signature regressions except increased visibility of resolveWidgetId.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • avestaadfa
  • Daniel-ADFA

Poem

🐰 I hop through tags and tidy text,

OCR fuzz I mend next.
Widgets named and gaps refined,
Spinner lists and IDs aligned.
A little rabbit fixed the kind.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main fix: preventing single placeholder dropdowns from generating string arrays in the strings file, which is the primary objective of this PR.
Description check ✅ Passed The description is well-related to the changeset, explaining both the main fix for placeholder dropdown handling and the supporting refactoring of OCR tag parsing logic into WidgetTagParser.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ADFA-3910-dropdown-placeholder-string-xml-experimental

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jatezzz jatezzz requested a review from avestaadfa May 6, 2026 13:05
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt (1)

12-29: 💤 Low value

Solid OCR-aware splitting; one optional simplification.

The branching (punctuation-first, then digit-only whitespace fallback) sensibly avoids over-splitting non-numeric phrases while still handling space-separated numeric OCR like "1 2 3". The empty-input case (size != 1) gracefully short-circuits to an empty list.

Minor nit: the this.trim() on line 22 is redundant given that WHITESPACE_DELIMITERS = "\\s+" plus filter { it.isNotBlank() } already discards leading/trailing whitespace tokens. Removing it would reduce one allocation per call.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt`
around lines 12 - 29, In extractOcrEntries(), remove the redundant this.trim()
call when creating whitespaceTokens since WHITESPACE_DELIMITERS ("\\s+") plus
filter { it.isNotBlank() } already handles leading/trailing spaces; update the
whitespaceTokens assignment in the function (referencing WHITESPACE_DELIMITERS
and whitespaceTokens) to split directly on WHITESPACE_DELIMITERS and filter
blanks, leaving the rest of the function logic (punctuatedTokens,
isSpaceSeparatedOcrNumbers, and the return) unchanged.
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt (1)

123-178: 💤 Low value

Single-placeholder bypass implemented correctly; consider unifying placeholder knowledge.

The core fix — isSinglePlaceholderEntry() gating string-array creation in processAttributes (line 158) — is precise and matches the PR objective. Nicely scoped: ID derivation/resolution still proceeds normally, only the string-array write is suppressed.

Optional consistency nit: isMeaningfulDropdownText (line 202) only excludes "dropdown", while placeholderEntries covers year/month/day/select/choose/dropdown. As a result, a box.text of "select" will still pass isMeaningfulDropdownText and feed into rawEntries, only to be filtered out later by isSinglePlaceholderEntry. Routing both checks through the same set would make the intent uniform and avoid future drift if the placeholder list grows.

♻️ Suggested alignment
     private fun String.isMeaningfulDropdownText(): Boolean {
-        val cleaned = normalizedDropdownLabel()
-        return cleaned.isNotBlank() && !cleaned.equals("dropdown", ignoreCase = true)
+        val cleaned = normalizedDropdownLabel()
+        return cleaned.isNotBlank() && cleaned.lowercase() !in placeholderEntries
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`
around lines 123 - 178, The placeholder list is duplicated: companion object
placeholderEntries is used by isSinglePlaceholderEntry but
isMeaningfulDropdownText uses a different hardcoded check; unify them by making
isMeaningfulDropdownText consult the same placeholderEntries set (or move
placeholderEntries to a shared util/constant) and ensure both use the same
normalization/lowercasing (call normalizedDropdownLabel().lowercase() or
equivalent) so processAttributes, isSinglePlaceholderEntry, and
isMeaningfulDropdownText all reference the same canonical placeholder set.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`:
- Around line 60-83: normalizeTagToken calls normalizeOcrDigits with an
UPPERCASED token but normalizeOcrDigits currently targets lowercase letters and
only maps O/I/! → digits, causing inconsistency with isNumericLikeOcrChar and
EntriesValidator.cleanNumberArtifacts; fix by making normalizeOcrDigits operate
on the uppercase input and include the same OCR-to-digit mappings (e.g., map 'I'
and '!' and 'L' → '1', 'O' → '0', and add 'Z' → '2', 'S' → '5', 'B' → '6') so
normalizeOcrDigits, isNumericLikeOcrChar, and extractOrdinal behavior are
consistent (update the normalizeOcrDigits implementation referenced by
normalizeTagToken and keep isNumericLikeOcrChar as-is).

---

Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`:
- Around line 123-178: The placeholder list is duplicated: companion object
placeholderEntries is used by isSinglePlaceholderEntry but
isMeaningfulDropdownText uses a different hardcoded check; unify them by making
isMeaningfulDropdownText consult the same placeholderEntries set (or move
placeholderEntries to a shared util/constant) and ensure both use the same
normalization/lowercasing (call normalizedDropdownLabel().lowercase() or
equivalent) so processAttributes, isSinglePlaceholderEntry, and
isMeaningfulDropdownText all reference the same canonical placeholder set.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt`:
- Around line 12-29: In extractOcrEntries(), remove the redundant this.trim()
call when creating whitespaceTokens since WHITESPACE_DELIMITERS ("\\s+") plus
filter { it.isNotBlank() } already handles leading/trailing spaces; update the
whitespaceTokens assignment in the function (referencing WHITESPACE_DELIMITERS
and whitespaceTokens) to split directly on WHITESPACE_DELIMITERS and filter
blanks, leaving the rest of the function logic (punctuatedTokens,
isSpaceSeparatedOcrNumbers, and the return) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3c853016-4505-4a8b-827a-8ca2c2356296

📥 Commits

Reviewing files that changed from the base of the PR and between d087123 and 19d42ff.

📒 Files selected for processing (6)
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetAnnotationMatcher.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt

@jatezzz jatezzz force-pushed the fix/ADFA-3910-dropdown-placeholder-string-xml-experimental branch from 9f74584 to fda1b16 Compare May 6, 2026 15:29
@jatezzz jatezzz force-pushed the fix/ADFA-3910-dropdown-placeholder-string-xml-experimental branch from bba5231 to b346c82 Compare May 6, 2026 19:54
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt (1)

85-118: 💤 Low value

Optional: share the OCR character class with OcrExtensions.

isEntireArrayLikelyNumeric uses [0-9oOlIzZsSbB\s]+ and cleanNumberArtifacts uses the same alphabet split into per-letter replaces. The exact same character set (sans \s) lives in OCR_NUMERIC_PATTERN in OcrExtensions.kt. Worth extracting a single constant (e.g. OCR_DIGIT_LIKE_CHARS) so the two stay in sync if the alphabet ever expands (e.g. adding gG9 or tT7).

Not blocking — purely a maintainability improvement.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt`
around lines 85 - 118, Extract the shared OCR digit-like character set into a
single constant (e.g. OCR_DIGIT_LIKE_CHARS) in OcrExtensions and replace the
inline class in isEntireArrayLikelyNumeric and the per-letter replacements in
cleanNumberArtifacts to reference that constant (use the constant without the \s
when embedded in the Regex and add \s where needed in the regex only); update
isEntireArrayLikelyNumeric to build its Regex from
OcrExtensions.OCR_DIGIT_LIKE_CHARS and have cleanNumberArtifacts use the same
constant for all character mapping/replacement logic so both functions stay in
sync when the alphabet changes.
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt (3)

123-178: ⚡ Quick win

Placeholder filtering is exact-match — consider extending to common phrasings.

isSinglePlaceholderEntry only fires when the single normalized entry exactly equals one of {year, month, day, select, choose, dropdown}. Real-world dropdowns frequently render placeholders like "Select an option", "Choose one", "Tap to select", "-- Select --", which won't be filtered and will still produce a one-element string-array.

Minimally safe options (pick one):

♻️ Match by prefix / token containment
     private fun List<String>.isSinglePlaceholderEntry(): Boolean {
         if (size != 1) return false
-        return first().normalizedDropdownLabel().lowercase() in placeholderEntries
+        val normalized = first().normalizedDropdownLabel().lowercase().trim('-', ' ', '.')
+        if (normalized in placeholderEntries) return true
+        // Catch phrasings like "select an option", "choose one", "-- select --"
+        return placeholderEntries.any { keyword ->
+            normalized == keyword || normalized.startsWith("$keyword ")
+        }
     }

Out of scope if the ticket specifically targets only the listed labels — feel free to defer.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`
around lines 123 - 178, isSinglePlaceholderEntry currently only matches exact
normalized tokens in placeholderEntries, so common real-world placeholders like
"Select an option", "-- Select --", or "Tap to select" slip through; update
isSinglePlaceholderEntry (and/or placeholderEntries) to normalize the single
entry by trimming punctuation/extra dashes, lowercasing, and then check for
broader matches such as startsWith or contains any placeholder token (e.g.,
"select", "choose", "tap", "option") or match a small regex that accepts
leading/trailing non-alphanumerics and phrases like "select( an?| one)?", so
that fallback array creation is skipped for these common placeholder phrasings
referenced by isSinglePlaceholderEntry and normalizedDropdownLabel.

109-116: 💤 Low value

sanitizeResourceName ordering is correct.

Lowercasing before applying [^a-z0-9_] is the right order (otherwise uppercase letters would have been wiped first). The leading-digit edge case (e.g., "12px") is implicitly handled by the dd_ prefix in SpinnerWidget.fallbackIdLabel, so the produced ID is always a valid Android resource identifier.

If you ever reuse sanitizeResourceName from a context that doesn't prepend a guaranteed letter prefix, prepend a fallback letter inside this helper to keep it safe by construction.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`
around lines 109 - 116, sanitizeResourceName currently lowercases and strips
non-alphanumerics via nonAlphanumericRegex and multipleUnderscoresRegex but
relies on callers to ensure the first character is a letter; make the helper
safe-by-construction by ensuring the returned string starts with a letter: after
performing the existing transformations (lowercase, replace
nonAlphanumericRegex, collapse multipleUnderscoresRegex, trim('_')), if the
result is empty or its first char is not in 'a'..'z', prepend a fallback letter
(e.g., 'a') so the final value is always a valid Android resource id; keep
sanitizeResourceName, nonAlphanumericRegex and multipleUnderscoresRegex usage
but add this post-processing step.

190-194: 💤 Low value

removeLeadingDropdownHint may over-trim a legitimate one-letter token.

^[vV]\s+ will strip any leading single "v"/"V" followed by whitespace. That's fine for the typical OCR "v" mistaken for "▼", but it would also clip legitimate inputs like "V 8", "V Country", or "v 1.0". Given this is OCR heuristics on dropdown text the false-positive surface is small, but consider tightening to only strip when the remainder still looks like dropdown content (e.g., starts with an uppercase word) or restricting to lowercase v only:

♻️ Slightly tighter pattern (optional)
-    private fun String.removeLeadingDropdownHint(): String {
-        return trim()
-            .replace(Regex("^[vV]\\s+"), "")
-            .trim()
-    }
+    private fun String.removeLeadingDropdownHint(): String {
+        // Only strip a standalone leading 'v'/'V' when followed by another token —
+        // narrower than `^[vV]\s+` which also clips e.g. "V 8" / "v 1.0".
+        return trim()
+            .replace(Regex("^[vV]\\s+(?=\\D)"), "")
+            .trim()
+    }

Non-blocking; flagging since OCR heuristics tend to accumulate edge cases.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`
around lines 190 - 194, The current removeLeadingDropdownHint() uses
Regex("^[vV]\\s+") which will strip a leading uppercase "V" token that may be
legitimate; update the function to be more conservative — e.g., change the regex
to only match lowercase v (Regex("^[v]\\s+")) or, if you prefer to still allow
uppercase removal only for likely dropdown hints, use a lookahead that requires
the remainder to start like dropdown text (Regex("^[vV]\\s+(?=[A-Z])")); modify
the String.removeLeadingDropdownHint implementation accordingly so it only
strips when the tighter condition matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 94-98: The implicitBlocks filter currently accepts any untagged
block with annotationText.length >= 5 and therefore lets OCR-only canvas
metadata through; update the parsedBlocks.filter that defines implicitBlocks to
also exclude canvas metadata by calling the existing isCanvasMetadata(...)
predicate (i.e., change the predicate to it.tag == null &&
it.annotationText.length >= 5 && !isCanvasMetadata(it) or the equivalent
signature used elsewhere), so those canvas labels never reach the implicit
annotation assignment code that assigns to the nearest unresolved widget.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`:
- Around line 19-21: normalizeTagText currently only trims trailing punctuation
so leading OCR characters like '|', ':', ';', '.', ',', '_' still prevent
tagExtractRegex from matching; update normalizeTagText to strip those leading
characters (e.g., trimStart for the same set '|:;.,_') before applying
tagExtractRegex and returning uppercase, and then remove any redundant manual
leading-trim logic in callers that expected pre-cleaned input (e.g., places that
previously pre-trimmed in parseBlock / MarginAnnotationParser.parse) so the
shared parser is behaviorally consistent.

---

Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt`:
- Around line 85-118: Extract the shared OCR digit-like character set into a
single constant (e.g. OCR_DIGIT_LIKE_CHARS) in OcrExtensions and replace the
inline class in isEntireArrayLikelyNumeric and the per-letter replacements in
cleanNumberArtifacts to reference that constant (use the constant without the \s
when embedded in the Regex and add \s where needed in the regex only); update
isEntireArrayLikelyNumeric to build its Regex from
OcrExtensions.OCR_DIGIT_LIKE_CHARS and have cleanNumberArtifacts use the same
constant for all character mapping/replacement logic so both functions stay in
sync when the alphabet changes.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`:
- Around line 123-178: isSinglePlaceholderEntry currently only matches exact
normalized tokens in placeholderEntries, so common real-world placeholders like
"Select an option", "-- Select --", or "Tap to select" slip through; update
isSinglePlaceholderEntry (and/or placeholderEntries) to normalize the single
entry by trimming punctuation/extra dashes, lowercasing, and then check for
broader matches such as startsWith or contains any placeholder token (e.g.,
"select", "choose", "tap", "option") or match a small regex that accepts
leading/trailing non-alphanumerics and phrases like "select( an?| one)?", so
that fallback array creation is skipped for these common placeholder phrasings
referenced by isSinglePlaceholderEntry and normalizedDropdownLabel.
- Around line 109-116: sanitizeResourceName currently lowercases and strips
non-alphanumerics via nonAlphanumericRegex and multipleUnderscoresRegex but
relies on callers to ensure the first character is a letter; make the helper
safe-by-construction by ensuring the returned string starts with a letter: after
performing the existing transformations (lowercase, replace
nonAlphanumericRegex, collapse multipleUnderscoresRegex, trim('_')), if the
result is empty or its first char is not in 'a'..'z', prepend a fallback letter
(e.g., 'a') so the final value is always a valid Android resource id; keep
sanitizeResourceName, nonAlphanumericRegex and multipleUnderscoresRegex usage
but add this post-processing step.
- Around line 190-194: The current removeLeadingDropdownHint() uses
Regex("^[vV]\\s+") which will strip a leading uppercase "V" token that may be
legitimate; update the function to be more conservative — e.g., change the regex
to only match lowercase v (Regex("^[v]\\s+")) or, if you prefer to still allow
uppercase removal only for likely dropdown hints, use a lookahead that requires
the remainder to start like dropdown text (Regex("^[vV]\\s+(?=[A-Z])")); modify
the String.removeLeadingDropdownHint implementation accordingly so it only
strips when the tighter condition matches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c916df64-102c-44cf-aa86-0ce310fd32a8

📥 Commits

Reviewing files that changed from the base of the PR and between bba5231 and b346c82.

📒 Files selected for processing (6)
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetAnnotationMatcher.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt

@jatezzz jatezzz merged commit 69e03bf into stage May 6, 2026
2 checks passed
@jatezzz jatezzz deleted the fix/ADFA-3910-dropdown-placeholder-string-xml-experimental branch May 6, 2026 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants