Fix lexer infinite loop / abort on invalid UTF-8 byte by ksss · Pull Request #2973 · ruby/rbs

ksss · 2026-05-25T06:43:21Z

Summary

rbs_next_char left byte_len = 0 when the active encoding's char_width() rejected a byte. Depending on where the byte appeared, the lexer either:

looped forever inside a comment (GVL held, SIGINT ineffective), or
tripped RBS_ASSERT(current_character_bytes > 0, ...) in rbs_skip at top level and called exit(1).

This PR makes rbs_next_char advance one byte in that case so the lexer always makes progress; the invalid byte then flows into the usual parser error path.

Reproducer

buf = RBS::Buffer.new(content: "# \xC2".force_encoding("UTF-8"), name: "x.rbs")
RBS::Parser._parse_signature(buf, 0, buf.content.bytesize)
# hangs forever; only `kill -9` stops the process

Fix

src/lexstate.c (rbs_next_char): when encoding->char_width() returns 0, treat the byte as a 1-byte garbage character and advance one byte. The lexer's invariant "cursor advances by at least one byte per step" is now preserved on every code path. Valid UTF-8 input is unaffected because char_width() never returns 0 for valid sequences.

This PR description was written by Claude Code.

When the active encoding's `char_width` returned 0 for a byte, `rbs_next_char` left `byte_len = 0`. The lexer then either looped forever (when the byte was inside a comment) or tripped `RBS_ASSERT(current_character_bytes > 0, ...)` in `rbs_skip` at top level. Treat such a byte as a 1-byte garbage character so the lexer always advances at least one byte. The invalid byte then surfaces as a regular parsing error through the existing error path. Minimal reproducer that used to hang the host process indefinitely with the GVL held: RBS::Parser._parse_signature( RBS::Buffer.new(content: "# \xC2".force_encoding("UTF-8"), name: "x.rbs"), 0, 3 ) Found by fuzzing the parser entry points with random byte mutations of the existing seed RBS files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ksss added this to the RBS 4.1 milestone May 25, 2026

ksss force-pushed the ksss/fix-lexer-invalid-utf8 branch from 4d292ab to e9612b4 Compare May 25, 2026 06:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix lexer infinite loop / abort on invalid UTF-8 byte#2973

Fix lexer infinite loop / abort on invalid UTF-8 byte#2973
ksss wants to merge 1 commit into
ruby:masterfrom
ksss:ksss/fix-lexer-invalid-utf8

ksss commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ksss commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproducer

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ksss commented May 25, 2026 •

edited

Loading