fix: repair embedded quotes before digit-prefixed tokens (e.g. "985/211")#166
Open
HANYIIK wants to merge 1 commit into
Open
fix: repair embedded quotes before digit-prefixed tokens (e.g. "985/211")#166HANYIIK wants to merge 1 commit into
HANYIIK wants to merge 1 commit into
Conversation
Previously, the heuristic `isDigit(text[i])` treated any potential closing quote followed by a digit as a genuine end-of-string. This caused strings like `"includes "985/211" items"` to be cut short at the `"` before `985`, because `9` is a digit, leading to a downstream "Colon expected" error when `985` was misinterpreted as an object key and `/` was found instead of `:`. Fix by replacing the bare `isDigit` check with `isFollowedByNumber`, which scans past the full number token (integer, optional decimal, optional exponent) and only treats the quote as a real end if the number is immediately followed by a structural JSON character (`,`, `}`, `]`), whitespace, or EOF. If the digit sequence bleeds into non-structural content (e.g. `/` in `985/211`, or letters in `2fa`), the quote is treated as embedded and escaped instead. The fix is applied to both the regular and streaming parsers. Existing repair of `["a" 42]` → `["a", 42]` (standalone number after string, missing comma) is preserved. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a JSON string value contains an embedded unescaped double-quote immediately followed by a digit,
jsonrepairincorrectly treats that quote as the end of the string. This happens because the heuristic at the end-quote detection site uses a bareisDigit(text[i])check: if the character after a potential closing quote is a digit, the quote is assumed to be genuine.This causes inputs like the following to throw
Colon expected at position Ninstead of being repaired:{"usage": "includes \"985/211\" items"} {"label": "requires \"2fa\" enabled"} {"version": "version \"3.0.1\" released"}In each case the digit-prefixed token (
985,2,3) is a fragment of a longer value, not a standalone JSON number.Root cause
parseString(regular/jsonrepair.ts:518,streaming/core.ts:709):A
"followed by9in"985/211"satisfiesisDigit, so the string is cut short at that quote. The remainder (985/211"…) is then parsed at object level,985becomes an unquoted key, and/is found where:is expected.Fix
Replace the bare
isDigitcheck withisFollowedByNumber, which scans past the full number token (integer part, optional decimal, optional exponent) and only accepts the quote as a real end-of-string when the number is immediately followed by a structural JSON character (,,},]), whitespace, or EOF.isFollowedByNumberis exported fromstringUtils.tsfor the regular parser. The streaming parser uses an equivalent inline closure (isInputFollowedByNumber) becauseInputBuffer.charCodeAtthrows on out-of-bounds access rather than returningNaN.Tests
Three new cases that previously threw are now repaired correctly, and the existing
["a" 2]/["a" 2.5]/["a" 2e10]regressions are verified to still pass.All 158 existing tests continue to pass.