fold:fix gnu test fold-zero-width.sh#9274
Conversation
…scii_line Implement logic to increment column count in WidthMode::Characters, emitting output when width is reached. This ensures accurate line folding for multi-byte characters, enhancing Unicode support.
|
GNU testsuite comparison: |
- Added conditional check in fold_file function to call emit_output when col_count >= width - Ensures lines are properly wrapped based on byte or character width before final output flush - Improves handling of incomplete lines that need early breaking to respect the specified width
CodSpeed Performance ReportMerging this PR will improve performance by 50.06%Comparing Summary
Performance Changes
Footnotes
|
In character width mode, emit output immediately after segments are added if column count exceeds width, preventing redundant flushes. Simplify the file folding logic by removing unnecessary conditional checks at the end, ensuring clean output writing. This fixes potential issues with extra line breaks or incorrect folding behavior.
…ability Refactor code in fold.rs to break lengthy if-condition statements across multiple lines in push_ascii_segment, process_utf8_line, and process_non_utf8_line functions. This improves code readability without changing functionality.
|
GNU testsuite comparison: |
…ory usage Introduce a STREAMING_FLUSH_THRESHOLD constant and helper functions (maybe_flush_unbroken_output, push_byte, push_bytes) to periodically flush the output buffer when it exceeds 8KB and no spaces are being tracked, preventing excessive memory consumption when processing large files. This refactor replaces direct buffer pushes with checks for threshold-based flushing.
|
Could you please add tests? |
|
GNU testsuite comparison: |
|
and please fix this regression: |
…d tests Remove conditional checks that incorrectly emitted output when column count reached width in character mode, ensuring proper folding of wide characters and handling of edge cases. Add comprehensive tests for wide characters, invalid UTF-8, zero-width spaces, and buffer boundaries to verify correct behavior. This prevents issues with multi-byte character folding where output was prematurely flushed, improving accuracy for Unicode input.
- Remove trailing empty lines in fold.rs - Compact multiline variable assignments in test_fold.rs for readability
…racters Add unicode-width crate to handle zero-width Unicode characters in fold utility. Introduced new test 'test_zero_width_data_line_counts' to verify correct wrapping in --characters mode for zero-width bytes and spaces, ensuring fold behaves consistently with character counts rather than visual width.
- Add bytecount dependency to Cargo.toml and Cargo.lock - Refactor newline_count function in test_fold.rs to use bytecount::count instead of manual iteration for better performance
|
GNU testsuite comparison: |
Modify the fold implementation to process input in buffered chunks rather than line-by-line reading, ensuring correct handling of multi-byte characters split across buffer boundaries. Add process_pending_chunk function and new streaming logic to fold_file for better performance on large files. Update tests accordingly.
Replace loop with early empty check by a while loop conditional on !pending.is_empty() for clarity. Restructure invalid UTF-8 error handling to first check if valid_up_to == 0, then process the valid prefix, improving code readability and flow without changing behavior.
Consolidate the assignment of the `valid` variable from multiple lines to a single line for improved code readability and adherence to style guidelines favoring concise declarations.
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
done |
|
#9328 just saw that this was succeeding, with both of these together the all of the fold tests will pass |
|
GNU testsuite comparison: |
|
are you sure fold-zero-width.sh is fixed for real ? :) |
Previously, the push_byte function only appended the byte to the buffer and returned Ok(()), potentially leaving output unflushed. This change adds a call to maybe_flush_unbroken_output to ensure proper flushing after each byte push, improving output reliability in the fold utility.
- Updated multiple crates including aho-corasick (1.1.3 -> 1.1.4), anstream (0.6.19 -> 0.6.21), and toml (0.22.27 -> 0.23.10+spec-1.0.0) - Added toml_parser dependency as required by updated toml crate - Ensures compatibility, security patches, and performance improvements across the project
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
- Updated chrono from 0.4.42 to 0.4.43 - Updated clap_lex from 0.7.6 to 0.7.7 - Updated getrandom from 0.2.16 to 0.2.17 - Updated flate2 from 1.1.5 to 1.1.8 with dependency changes - Updated icu_locale_data from 2.1.1 to 2.1.2 - Updated js-sys from 0.3.83 to 0.3.85 - Updated wasm-bindgen from 0.2.106 to 0.2.108 - Removed unused arbitrary and derive_arbitrary packages - Ensured compatibility and security fixes in dependencies
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
--------- Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
Implement logic to increment column count in WidthMode::Characters, emitting output when width is reached. This ensures accurate line folding for multi-byte characters, enhancing Unicode support.
related
#9127