Skip to content

fix: repair embedded quotes before digit-prefixed tokens (e.g. "985/211")#166

Open
HANYIIK wants to merge 1 commit into
josdejong:mainfrom
HANYIIK:fix/embedded-quotes-before-digit-prefixed-tokens
Open

fix: repair embedded quotes before digit-prefixed tokens (e.g. "985/211")#166
HANYIIK wants to merge 1 commit into
josdejong:mainfrom
HANYIIK:fix/embedded-quotes-before-digit-prefixed-tokens

Conversation

@HANYIIK
Copy link
Copy Markdown

@HANYIIK HANYIIK commented May 7, 2026

Problem

When a JSON string value contains an embedded unescaped double-quote immediately followed by a digit, jsonrepair incorrectly treats that quote as the end of the string. This happens because the heuristic at the end-quote detection site uses a bare isDigit(text[i]) check: if the character after a potential closing quote is a digit, the quote is assumed to be genuine.

This causes inputs like the following to throw Colon expected at position N instead of being repaired:

{"usage": "includes \"985/211\" items"}
{"label": "requires \"2fa\" enabled"}
{"version": "version \"3.0.1\" released"}

In each case the digit-prefixed token (985, 2, 3) is a fragment of a longer value, not a standalone JSON number.

Root cause

parseString (regular/jsonrepair.ts:518, streaming/core.ts:709):

// before
isDigit(text[i])

A " followed by 9 in "985/211" satisfies isDigit, so the string is cut short at that quote. The remainder (985/211"…) is then parsed at object level, 985 becomes an unquoted key, and / is found where : is expected.

Fix

Replace the bare isDigit check with isFollowedByNumber, which scans past the full number token (integer part, optional decimal, optional exponent) and only accepts the quote as a real end-of-string when the number is immediately followed by a structural JSON character (,, }, ]), whitespace, or EOF.

"985/211"  →  digit scan: 985 → next char: /  →  not structural → embedded quote → escape ✓
["a" 42]   →  digit scan: 42  → next char: ]  →  structural      → real end quote → split ✓
["a" 2.5]  →  digit scan: 2.5 → next char: ]  →  structural      → real end quote → split ✓

isFollowedByNumber is exported from stringUtils.ts for the regular parser. The streaming parser uses an equivalent inline closure (isInputFollowedByNumber) because InputBuffer.charCodeAt throws on out-of-bounds access rather than returning NaN.

Tests

Three new cases that previously threw are now repaired correctly, and the existing ["a" 2] / ["a" 2.5] / ["a" 2e10] regressions are verified to still pass.

✓ {"v": "包含"985/211"的候选人"}  →  {"v": "包含\"985/211\"的候选人"}
✓ {"v": "requires "2fa" enabled"}  →  {"v": "requires \"2fa\" enabled"}
✓ {"v": "version "3.0.1" released"}  →  {"v": "version \"3.0.1\" released"}
✓ ["a" 2]    →  ["a", 2]     (regression)
✓ ["a" 2.5]  →  ["a", 2.5]  (regression)
✓ ["a" 2e10] →  ["a", 2e10] (regression)

All 158 existing tests continue to pass.

Previously, the heuristic `isDigit(text[i])` treated any potential closing
quote followed by a digit as a genuine end-of-string. This caused strings
like `"includes "985/211" items"` to be cut short at the `"` before `985`,
because `9` is a digit, leading to a downstream "Colon expected" error when
`985` was misinterpreted as an object key and `/` was found instead of `:`.

Fix by replacing the bare `isDigit` check with `isFollowedByNumber`, which
scans past the full number token (integer, optional decimal, optional
exponent) and only treats the quote as a real end if the number is
immediately followed by a structural JSON character (`,`, `}`, `]`),
whitespace, or EOF. If the digit sequence bleeds into non-structural content
(e.g. `/` in `985/211`, or letters in `2fa`), the quote is treated as
embedded and escaped instead.

The fix is applied to both the regular and streaming parsers. Existing
repair of `["a" 42]` → `["a", 42]` (standalone number after string,
missing comma) is preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant