Skip to content

ptx: use char count instead of byte index to handle utf-8 characters#7897

Merged
cakebaker merged 1 commit intouutils:mainfrom
aaron-ang:ptx-panic
May 7, 2025
Merged

ptx: use char count instead of byte index to handle utf-8 characters#7897
cakebaker merged 1 commit intouutils:mainfrom
aaron-ang:ptx-panic

Conversation

@aaron-ang
Copy link
Contributor

@aaron-ang aaron-ang commented May 7, 2025

close #2049

As discussed in the issue, ptx breaks when handling utf-8 characters (which may span multiple bytes). We need to use character positions instead of byte positions to index into chars_line. We can do so using .chars().count(). I created a helper function to consolidate this change since format_roff_line and format_tex_line use the same logic. I've also added some comments and renamed variables for readability.

@aaron-ang
Copy link
Contributor Author

printf "it’s disabled\n" > foo.txt

## in main
cargo run -q ptx -G foo.txt
# thread 'main' panicked at src/uu/ptx/src/ptx.rs:610:32:
# slice index starts at 15 but ends at 13
# note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
cargo run -q ptx -G -T foo.txt
# thread 'main' panicked at src/uu/ptx/src/ptx.rs:566:32:
# slice index starts at 15 but ends at 13
# note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

## in ptx-panic
cargo run -q ptx -G foo.txt
# .xx "" "it’s" "disabled" ""
# .xx "" "" "it’s disabled" ""
cargo run -q ptx -G -T foo.txt
# \xx {}{it’s}{disabled}{}{}
# \xx {}{}{it’s}{ disabled}{}

## system
ptx -G foo.txt
# .xx "" "it’s" "disabled" ""
# .xx "" "" "it’s disabled" ""
ptx -G -T foo.txt
# \xx {}{it’s}{disabled}{}{}
# \xx {}{}{it’s}{ disabled}{}

@github-actions
Copy link

github-actions bot commented May 7, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@cakebaker cakebaker merged commit bcc02e9 into uutils:main May 7, 2025
69 of 70 checks passed
@cakebaker
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ptx: special char breaks it with "thread 'main' panicked at 'assertion failed: end <= s.len()', src/uu/ptx/src/ptx.rs:291:5"

2 participants