Summary
find -printf panics on a format string where a \ octal escape is immediately followed by a multibyte UTF-8 character (e.g. '\0€\n'). The escape parser peeks ahead 3 bytes for octal digits but checks only the byte length of the remaining string, not a char boundary, so it slices through the middle of the following multibyte char. This aborts with a Rust panic (exit code 101). GNU find accepts the same input, emitting the escaped byte then the literal character.
Steps to reproduce
$ mkdir -p /tmp/ftest
$ ./target/release/find /tmp/ftest -printf '\0€\n'
thread 'main' panicked at src/find/matchers/printf.rs:146:24:
byte index 3 is not a char boundary; it is inside '€' (bytes 1..4) of `0€`
$ echo $?
101
Other inputs that trigger the same panic: '\1😀', '\00é'.
Expected behavior
Match GNU: process the octal escape and print the following character literally, exiting 0.
$ /usr/bin/find /tmp/ftest -printf '\0€\n'
$ echo $?
0
Root cause
parse_escape_sequence calls peek(OCTAL_LEN) (3 bytes) when an octal digit follows \, and peek slices the string after checking only its byte length:
// src/find/matchers/printf.rs:141
fn peek(&self, count: usize) -> Result<&str, Box<dyn Error>> {
if self.string.len() < count {
return Err("Unexpected EOF".into());
}
Ok(&self.string[0..count]) // line 146: panics if `count` is not a char boundary
}
The guard self.string.len() < count only ensures there are enough bytes; it does not ensure that byte index count falls on a UTF-8 char boundary. When a multibyte character occupies bytes that span index 3 (e.g. € at bytes 1..4 of 0€), &self.string[0..3] slices inside the char and panics.
Found by source-audit of uutils/findutils.
Summary
find -printfpanics on a format string where a\octal escape is immediately followed by a multibyte UTF-8 character (e.g.'\0€\n'). The escape parser peeks ahead 3 bytes for octal digits but checks only the byte length of the remaining string, not a char boundary, so it slices through the middle of the following multibyte char. This aborts with a Rust panic (exit code 101). GNU find accepts the same input, emitting the escaped byte then the literal character.Steps to reproduce
Other inputs that trigger the same panic:
'\1😀','\00é'.Expected behavior
Match GNU: process the octal escape and print the following character literally, exiting 0.
Root cause
parse_escape_sequencecallspeek(OCTAL_LEN)(3 bytes) when an octal digit follows\, andpeekslices the string after checking only its byte length:The guard
self.string.len() < countonly ensures there are enough bytes; it does not ensure that byte indexcountfalls on a UTF-8 char boundary. When a multibyte character occupies bytes that span index 3 (e.g.€at bytes 1..4 of0€),&self.string[0..3]slices inside the char and panics.Found by source-audit of uutils/findutils.