feat(cli): add word-level LRC output with UTF-8 fix #3619
+106
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Add new
-olrcw/--output-lrc-wordoption for word-level LRC output with inline timestamps per token, and fix UTF-8 character handling issues.Changes
output_lrc_wordparameter and CLI option-olrcwoutput_lrc_word()function with per-token timestampstoken_timestampswhenoutput_lrc_wordis setUTF-8 Fix (addresses #1798)
CJK characters (3 bytes in UTF-8) were being split across tokens with timestamps inserted between bytes:
Before (broken):
After (fixed):
The fix detects UTF-8 continuation bytes (
10xxxxxx) and merges them with the previous token.Output Format
Test Plan