ADFA-3910 | Fix single placeholder dropdown array generation in strings file#1274
Conversation
📝 WalkthroughRelease NotesFeatures & Improvements
Technical Changes
Risks & Best-Practice Concerns
Migration Notes
WalkthroughThis PR centralizes widget tag parsing into a new internal ChangesTag Parsing Centralization
OCR-Aware Entry Extraction & Widget Enhancements
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt (1)
12-29: 💤 Low valueSolid OCR-aware splitting; one optional simplification.
The branching (punctuation-first, then digit-only whitespace fallback) sensibly avoids over-splitting non-numeric phrases while still handling space-separated numeric OCR like
"1 2 3". The empty-input case (size != 1) gracefully short-circuits to an empty list.Minor nit: the
this.trim()on line 22 is redundant given thatWHITESPACE_DELIMITERS = "\\s+"plusfilter { it.isNotBlank() }already discards leading/trailing whitespace tokens. Removing it would reduce one allocation per call.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt` around lines 12 - 29, In extractOcrEntries(), remove the redundant this.trim() call when creating whitespaceTokens since WHITESPACE_DELIMITERS ("\\s+") plus filter { it.isNotBlank() } already handles leading/trailing spaces; update the whitespaceTokens assignment in the function (referencing WHITESPACE_DELIMITERS and whitespaceTokens) to split directly on WHITESPACE_DELIMITERS and filter blanks, leaving the rest of the function logic (punctuatedTokens, isSpaceSeparatedOcrNumbers, and the return) unchanged.cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt (1)
123-178: 💤 Low valueSingle-placeholder bypass implemented correctly; consider unifying placeholder knowledge.
The core fix —
isSinglePlaceholderEntry()gating string-array creation inprocessAttributes(line 158) — is precise and matches the PR objective. Nicely scoped: ID derivation/resolution still proceeds normally, only thestring-arraywrite is suppressed.Optional consistency nit:
isMeaningfulDropdownText(line 202) only excludes"dropdown", whileplaceholderEntriescoversyear/month/day/select/choose/dropdown. As a result, abox.textof"select"will still passisMeaningfulDropdownTextand feed intorawEntries, only to be filtered out later byisSinglePlaceholderEntry. Routing both checks through the same set would make the intent uniform and avoid future drift if the placeholder list grows.♻️ Suggested alignment
private fun String.isMeaningfulDropdownText(): Boolean { - val cleaned = normalizedDropdownLabel() - return cleaned.isNotBlank() && !cleaned.equals("dropdown", ignoreCase = true) + val cleaned = normalizedDropdownLabel() + return cleaned.isNotBlank() && cleaned.lowercase() !in placeholderEntries }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt` around lines 123 - 178, The placeholder list is duplicated: companion object placeholderEntries is used by isSinglePlaceholderEntry but isMeaningfulDropdownText uses a different hardcoded check; unify them by making isMeaningfulDropdownText consult the same placeholderEntries set (or move placeholderEntries to a shared util/constant) and ensure both use the same normalization/lowercasing (call normalizedDropdownLabel().lowercase() or equivalent) so processAttributes, isSinglePlaceholderEntry, and isMeaningfulDropdownText all reference the same canonical placeholder set.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`:
- Around line 60-83: normalizeTagToken calls normalizeOcrDigits with an
UPPERCASED token but normalizeOcrDigits currently targets lowercase letters and
only maps O/I/! → digits, causing inconsistency with isNumericLikeOcrChar and
EntriesValidator.cleanNumberArtifacts; fix by making normalizeOcrDigits operate
on the uppercase input and include the same OCR-to-digit mappings (e.g., map 'I'
and '!' and 'L' → '1', 'O' → '0', and add 'Z' → '2', 'S' → '5', 'B' → '6') so
normalizeOcrDigits, isNumericLikeOcrChar, and extractOrdinal behavior are
consistent (update the normalizeOcrDigits implementation referenced by
normalizeTagToken and keep isNumericLikeOcrChar as-is).
---
Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`:
- Around line 123-178: The placeholder list is duplicated: companion object
placeholderEntries is used by isSinglePlaceholderEntry but
isMeaningfulDropdownText uses a different hardcoded check; unify them by making
isMeaningfulDropdownText consult the same placeholderEntries set (or move
placeholderEntries to a shared util/constant) and ensure both use the same
normalization/lowercasing (call normalizedDropdownLabel().lowercase() or
equivalent) so processAttributes, isSinglePlaceholderEntry, and
isMeaningfulDropdownText all reference the same canonical placeholder set.
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt`:
- Around line 12-29: In extractOcrEntries(), remove the redundant this.trim()
call when creating whitespaceTokens since WHITESPACE_DELIMITERS ("\\s+") plus
filter { it.isNotBlank() } already handles leading/trailing spaces; update the
whitespaceTokens assignment in the function (referencing WHITESPACE_DELIMITERS
and whitespaceTokens) to split directly on WHITESPACE_DELIMITERS and filter
blanks, leaving the rest of the function logic (punctuatedTokens,
isSpaceSeparatedOcrNumbers, and the return) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3c853016-4505-4a8b-827a-8ca2c2356296
📒 Files selected for processing (6)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetAnnotationMatcher.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt
9f74584 to
fda1b16
Compare
Extract WidgetTagParser and ignore isolated placeholder entries like `select` or `dropdown` from strings.xml
bba5231 to
b346c82
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt (1)
85-118: 💤 Low valueOptional: share the OCR character class with
OcrExtensions.
isEntireArrayLikelyNumericuses[0-9oOlIzZsSbB\s]+andcleanNumberArtifactsuses the same alphabet split into per-letterreplaces. The exact same character set (sans\s) lives inOCR_NUMERIC_PATTERNinOcrExtensions.kt. Worth extracting a single constant (e.g.OCR_DIGIT_LIKE_CHARS) so the two stay in sync if the alphabet ever expands (e.g. addinggG→9ortT→7).Not blocking — purely a maintainability improvement.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt` around lines 85 - 118, Extract the shared OCR digit-like character set into a single constant (e.g. OCR_DIGIT_LIKE_CHARS) in OcrExtensions and replace the inline class in isEntireArrayLikelyNumeric and the per-letter replacements in cleanNumberArtifacts to reference that constant (use the constant without the \s when embedded in the Regex and add \s where needed in the regex only); update isEntireArrayLikelyNumeric to build its Regex from OcrExtensions.OCR_DIGIT_LIKE_CHARS and have cleanNumberArtifacts use the same constant for all character mapping/replacement logic so both functions stay in sync when the alphabet changes.cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt (3)
123-178: ⚡ Quick winPlaceholder filtering is exact-match — consider extending to common phrasings.
isSinglePlaceholderEntryonly fires when the single normalized entry exactly equals one of{year, month, day, select, choose, dropdown}. Real-world dropdowns frequently render placeholders like "Select an option", "Choose one", "Tap to select", "-- Select --", which won't be filtered and will still produce a one-element string-array.Minimally safe options (pick one):
♻️ Match by prefix / token containment
private fun List<String>.isSinglePlaceholderEntry(): Boolean { if (size != 1) return false - return first().normalizedDropdownLabel().lowercase() in placeholderEntries + val normalized = first().normalizedDropdownLabel().lowercase().trim('-', ' ', '.') + if (normalized in placeholderEntries) return true + // Catch phrasings like "select an option", "choose one", "-- select --" + return placeholderEntries.any { keyword -> + normalized == keyword || normalized.startsWith("$keyword ") + } }Out of scope if the ticket specifically targets only the listed labels — feel free to defer.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt` around lines 123 - 178, isSinglePlaceholderEntry currently only matches exact normalized tokens in placeholderEntries, so common real-world placeholders like "Select an option", "-- Select --", or "Tap to select" slip through; update isSinglePlaceholderEntry (and/or placeholderEntries) to normalize the single entry by trimming punctuation/extra dashes, lowercasing, and then check for broader matches such as startsWith or contains any placeholder token (e.g., "select", "choose", "tap", "option") or match a small regex that accepts leading/trailing non-alphanumerics and phrases like "select( an?| one)?", so that fallback array creation is skipped for these common placeholder phrasings referenced by isSinglePlaceholderEntry and normalizedDropdownLabel.
109-116: 💤 Low value
sanitizeResourceNameordering is correct.Lowercasing before applying
[^a-z0-9_]is the right order (otherwise uppercase letters would have been wiped first). The leading-digit edge case (e.g., "12px") is implicitly handled by thedd_prefix inSpinnerWidget.fallbackIdLabel, so the produced ID is always a valid Android resource identifier.If you ever reuse
sanitizeResourceNamefrom a context that doesn't prepend a guaranteed letter prefix, prepend a fallback letter inside this helper to keep it safe by construction.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt` around lines 109 - 116, sanitizeResourceName currently lowercases and strips non-alphanumerics via nonAlphanumericRegex and multipleUnderscoresRegex but relies on callers to ensure the first character is a letter; make the helper safe-by-construction by ensuring the returned string starts with a letter: after performing the existing transformations (lowercase, replace nonAlphanumericRegex, collapse multipleUnderscoresRegex, trim('_')), if the result is empty or its first char is not in 'a'..'z', prepend a fallback letter (e.g., 'a') so the final value is always a valid Android resource id; keep sanitizeResourceName, nonAlphanumericRegex and multipleUnderscoresRegex usage but add this post-processing step.
190-194: 💤 Low value
removeLeadingDropdownHintmay over-trim a legitimate one-letter token.
^[vV]\s+will strip any leading single "v"/"V" followed by whitespace. That's fine for the typical OCR "v" mistaken for "▼", but it would also clip legitimate inputs like"V 8","V Country", or"v 1.0". Given this is OCR heuristics on dropdown text the false-positive surface is small, but consider tightening to only strip when the remainder still looks like dropdown content (e.g., starts with an uppercase word) or restricting to lowercasevonly:♻️ Slightly tighter pattern (optional)
- private fun String.removeLeadingDropdownHint(): String { - return trim() - .replace(Regex("^[vV]\\s+"), "") - .trim() - } + private fun String.removeLeadingDropdownHint(): String { + // Only strip a standalone leading 'v'/'V' when followed by another token — + // narrower than `^[vV]\s+` which also clips e.g. "V 8" / "v 1.0". + return trim() + .replace(Regex("^[vV]\\s+(?=\\D)"), "") + .trim() + }Non-blocking; flagging since OCR heuristics tend to accumulate edge cases.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt` around lines 190 - 194, The current removeLeadingDropdownHint() uses Regex("^[vV]\\s+") which will strip a leading uppercase "V" token that may be legitimate; update the function to be more conservative — e.g., change the regex to only match lowercase v (Regex("^[v]\\s+")) or, if you prefer to still allow uppercase removal only for likely dropdown hints, use a lookahead that requires the remainder to start like dropdown text (Regex("^[vV]\\s+(?=[A-Z])")); modify the String.removeLeadingDropdownHint implementation accordingly so it only strips when the tighter condition matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 94-98: The implicitBlocks filter currently accepts any untagged
block with annotationText.length >= 5 and therefore lets OCR-only canvas
metadata through; update the parsedBlocks.filter that defines implicitBlocks to
also exclude canvas metadata by calling the existing isCanvasMetadata(...)
predicate (i.e., change the predicate to it.tag == null &&
it.annotationText.length >= 5 && !isCanvasMetadata(it) or the equivalent
signature used elsewhere), so those canvas labels never reach the implicit
annotation assignment code that assigns to the nearest unresolved widget.
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`:
- Around line 19-21: normalizeTagText currently only trims trailing punctuation
so leading OCR characters like '|', ':', ';', '.', ',', '_' still prevent
tagExtractRegex from matching; update normalizeTagText to strip those leading
characters (e.g., trimStart for the same set '|:;.,_') before applying
tagExtractRegex and returning uppercase, and then remove any redundant manual
leading-trim logic in callers that expected pre-cleaned input (e.g., places that
previously pre-trimmed in parseBlock / MarginAnnotationParser.parse) so the
shared parser is behaviorally consistent.
---
Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt`:
- Around line 85-118: Extract the shared OCR digit-like character set into a
single constant (e.g. OCR_DIGIT_LIKE_CHARS) in OcrExtensions and replace the
inline class in isEntireArrayLikelyNumeric and the per-letter replacements in
cleanNumberArtifacts to reference that constant (use the constant without the \s
when embedded in the Regex and add \s where needed in the regex only); update
isEntireArrayLikelyNumeric to build its Regex from
OcrExtensions.OCR_DIGIT_LIKE_CHARS and have cleanNumberArtifacts use the same
constant for all character mapping/replacement logic so both functions stay in
sync when the alphabet changes.
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`:
- Around line 123-178: isSinglePlaceholderEntry currently only matches exact
normalized tokens in placeholderEntries, so common real-world placeholders like
"Select an option", "-- Select --", or "Tap to select" slip through; update
isSinglePlaceholderEntry (and/or placeholderEntries) to normalize the single
entry by trimming punctuation/extra dashes, lowercasing, and then check for
broader matches such as startsWith or contains any placeholder token (e.g.,
"select", "choose", "tap", "option") or match a small regex that accepts
leading/trailing non-alphanumerics and phrases like "select( an?| one)?", so
that fallback array creation is skipped for these common placeholder phrasings
referenced by isSinglePlaceholderEntry and normalizedDropdownLabel.
- Around line 109-116: sanitizeResourceName currently lowercases and strips
non-alphanumerics via nonAlphanumericRegex and multipleUnderscoresRegex but
relies on callers to ensure the first character is a letter; make the helper
safe-by-construction by ensuring the returned string starts with a letter: after
performing the existing transformations (lowercase, replace
nonAlphanumericRegex, collapse multipleUnderscoresRegex, trim('_')), if the
result is empty or its first char is not in 'a'..'z', prepend a fallback letter
(e.g., 'a') so the final value is always a valid Android resource id; keep
sanitizeResourceName, nonAlphanumericRegex and multipleUnderscoresRegex usage
but add this post-processing step.
- Around line 190-194: The current removeLeadingDropdownHint() uses
Regex("^[vV]\\s+") which will strip a leading uppercase "V" token that may be
legitimate; update the function to be more conservative — e.g., change the regex
to only match lowercase v (Regex("^[v]\\s+")) or, if you prefer to still allow
uppercase removal only for likely dropdown hints, use a lookahead that requires
the remainder to start like dropdown text (Regex("^[vV]\\s+(?=[A-Z])")); modify
the String.removeLeadingDropdownHint implementation accordingly so it only
strips when the tighter condition matches.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c916df64-102c-44cf-aa86-0ce310fd32a8
📒 Files selected for processing (6)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetAnnotationMatcher.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.ktcv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt
Description
Fixed an issue where single placeholder labels (e.g., "dropdown", "select", "choose") sketched inside a spinner/dropdown widget incorrectly generated a
string-arrayinstrings.xml. Additionally, OCR tag parsing logic has been refactored and centralized into a newWidgetTagParserobject.Details
isSinglePlaceholderEntry()check inSpinnerWidgetto bypass string array creation when the only item is a known placeholder.MarginAnnotationParserandWidgetAnnotationMatcherintoWidgetTagParser.AttributeValidator.document_5174968672900351910.mp4
Ticket
ADFA-3910
Observation
Centralizing the tag parsing logic into
WidgetTagParsersignificantly cleans up the margin and annotation matching domains, ensuring that OCR text normalization (e.g., replacing 'l' with '1', 'O' with '0') is applied consistently across the entire pipeline.