more: constant memory initialization overhead by aaron-ang · Pull Request #7765 · uutils/coreutils

aaron-ang · 2025-04-16T04:53:31Z

enum InputType {
	File(BufReader<File>),
	Stdin(Stdin),
}

This PR improves uu_more by achieving constant memory overhead during initialization. Previously (#7680), we read the entire file once and store byte positions. We now do so incrementally using the following techniques:

We handle files and pipes together using the InputType enum shown above. Stdin already implements BufRead and we need the underlying File type to get the file size for displaying the status.
When initializing the Pager object, we scan the only up to the start line (options.from_line). We store the lines read in lines, and the byte positions in cumulative_line_sizes to display the file navigation status.
If pattern matching is required, we scan from the initialized start line to find first line containing the pattern. Note that from_line takes precedence over pattern which is in line with more. This means that if the specified pattern only exists before from_line, it will not be found.
The file will be read on-demand without seeking and the newly read lines are appended toself.lines. This means we scan the entire file at most once.
When drawing lines, we check that self.upper_mark + self.content_rows have been read into memory. Then, we iterate over the relevant indexes in self.lines to print.
Moved some tests from test_more.rs to more.rs since we cannot easily extract stdout from crossterm's AlternateScreen.
Additional UI changes to match less.

aaron-ang · 2025-04-16T05:15:54Z

waiting for #7680 to be merged

github-actions · 2025-04-16T05:37:31Z

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

github-actions · 2025-04-16T09:21:04Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

github-actions · 2025-04-17T05:59:01Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

sylvestre · 2025-04-17T08:26:45Z

src/uu/more/src/more.rs

 const USAGE: &str = help_usage!("more.md");
-const BELL: &str = "\x07";
+const BELL: char = '\x07';
+const MULTI_FILE_TOP_PROMPT: &str = "\r::::::::::::::\n\r{}\n\r::::::::::::::\n";


as a bit cryptic, could you please add a comment for this? thanks

Will do @sylvestre. Please note that this PR is still WIP and not ready for full review. I'm awaiting my previous PR to be merged. Once that is done, I can better compare changes to main.

I will also do a writeup on the changes made, benchmarking, and add comments in the code.

github-actions · 2025-04-17T09:00:10Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

aaron-ang · 2025-04-20T20:43:46Z

@tertsdiepraam @sylvestre here is how I plan to refactor more in broad strokes

Create a new terminal screen (similar to less). On exit, restore previous screen. This can be easily done with crossterm::terminal::{EnterAlternateScreen, LeaveAlternateScreen}.
Fixing options: I realized some are labelled wrongly.
Instead of storing line byte offset, we incrementally read and store file data into lines. This means reverting back to the original implementation. We replace the line_count attribute with file byte size and an eof flag in the paging logic.

Would be great to hear your feedback!

github-actions · 2025-04-20T21:36:44Z

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?

aaron-ang · 2025-04-21T08:53:33Z

Benchmark

< /dev/urandom base64 | fold -w 120 | head -c 100M > bigfile
< /dev/urandom base64 | fold -w 120 | head -c 1M > smallfile
cargo build --no-default-features --features more --release
# file
/usr/bin/time ./target/release/coreutils more bigfile 
/usr/bin/time ./target/release/coreutils more smallfile
# pipe
/usr/bin/time bash -c 'cat bigfile | ./target/release/coreutils more'
/usr/bin/time bash -c 'cat smallfile | ./target/release/coreutils more'

# in `main`
# bigfile: 0.37user 0.11system 0:01.81elapsed 26%CPU (0avgtext+0avgdata 126796maxresident)k 0inputs+0outputs (0major+31065minor)pagefaults 0swaps
# smallfile: 0.00user 0.00system 0:01.21elapsed 0%CPU (0avgtext+0avgdata 4540maxresident)k 0inputs+0outputs (0major+452minor)pagefaults 0swaps
# bigfile(pipe): 0.37user 0.15system 0:01.97elapsed 26%CPU (0avgtext+0avgdata 126832maxresident)k 0inputs+0outputs (0major+31430minor)pagefaults 0swaps
# smallfile(pipe): 0.00user 0.01system 0:01.63elapsed 1%CPU (0avgtext+0avgdata 4344maxresident)k 0inputs+0outputs (0major+812minor)pagefaults 0swaps

# in `more-mem-constant`
# bigfile: 0.00user 0.00system 0:01.04elapsed 0%CPU (0avgtext+0avgdata 3084maxresident)k 0inputs+0outputs (0major+131minor)pagefaults 0swaps
# smallfile: 0.00user 0.00system 0:01.05elapsed 0%CPU (0avgtext+0avgdata 3224maxresident)k 0inputs+0outputs (0major+131minor)pagefaults 0swaps
# bigfile(pipe): 0.00user 0.00system 0:01.15elapsed 0%CPU (0avgtext+0avgdata 3528maxresident)k 0inputs+0outputs (0major+488minor)pagefaults 0swaps
# smallfile(pipe): 0.00user 0.00system 0:01.13elapsed 0%CPU (0avgtext+0avgdata 3592maxresident)k 0inputs+0outputs (0major+491minor)pagefaults 0swaps

# system
# bigfile: 0.00user 0.00system 0:00.95elapsed 0%CPU (0avgtext+0avgdata 2508maxresident)k 0inputs+0outputs (0major+136minor)pagefaults 0swaps
# smallfile: 0.00user 0.00system 0:01.15elapsed 0%CPU (0avgtext+0avgdata 2508maxresident)k 0inputs+0outputs (0major+136minor)pagefaults 0swaps
# bigfile(pipe): 0.00user 0.00system 0:01.11elapsed 0%CPU (0avgtext+0avgdata 3584maxresident)k 0inputs+0outputs (0major+494minor)pagefaults 0swaps
# smallfile(pipe): 0.00user 0.00system 0:01.10elapsed 0%CPU (0avgtext+0avgdata 3524maxresident)k 0inputs+0outputs (0major+492minor)pagefaults 0swaps

Greatly reduced time to first paint and memory usage (41x) for large files. Verified that the overhead is constant with respect to input size.

github-actions · 2025-04-21T09:20:41Z

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

github-actions · 2025-04-21T09:57:54Z

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?

github-actions · 2025-04-21T20:30:59Z

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

github-actions · 2025-04-22T01:13:40Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/usage_vs_getopt (fails in this run but passes in the 'main' branch)

aaron-ang · 2025-04-22T01:27:04Z

src/uu/more/src/more.rs

 pub fn uumain(args: impl uucore::Args) -> UResult<()> {
-    let _guard = TerminalGuard;
-
-    // Disable raw mode before exiting if a panic occurs
    set_hook(Box::new(|panic_info| {
-        terminal::disable_raw_mode().unwrap();
        print!("\r");
        println!("{panic_info}");
    }));


logic for modifying the terminal has been moved from uumain to more.

github-actions · 2025-04-22T02:02:40Z

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

aaron-ang · 2025-04-24T22:06:33Z

@tertsdiepraam could you please review this PR?

src/uu/more/Cargo.toml

github-actions · 2025-05-03T07:15:34Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

RenjiSann · 2025-05-03T12:57:21Z

Also, can you please squash the multiple commits in just one or two implementation steps ?

aaron-ang · 2025-05-03T18:12:19Z

Also, can you please squash the multiple commits in just one or two implementation steps ?

Could the PR be squashed into one commit during merge?

github-actions · 2025-05-03T20:53:36Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

src/uu/more/Cargo.toml

github-actions · 2025-05-04T09:46:50Z

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

github-actions · 2025-05-04T20:01:54Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

aaron-ang · 2025-05-04T21:38:33Z

I am facing some issues with running tests locally even though they pass in the test runner. Please give me some more time to fix it.

github-actions · 2025-05-05T00:44:38Z

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

aaron-ang · 2025-05-06T00:49:11Z

I have "fixed" the tests by migrating them to more.rs since we cannot easily extract stdout from crossterm's AlternateScreen. I also reran the benchmark with additional flags in the build: --no-default-features --features more, and we observe an astonishing 41x improvement in memory usage for the 100MB file 😂.

aaron-ang · 2025-05-06T01:03:36Z

src/uu/more/src/more.rs

+enum OutputType {
+    Tty(Stdout),
+    Pipe(Box<dyn Write>),
+    #[cfg(test)]
+    Test(Vec<u8>),
+}


This enum is used to distinguish between stdout types and provide a way for tests to verify output.

github-actions · 2025-05-06T01:14:49Z

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/usage_vs_getopt (passes in this run but fails in the 'main' branch)

hz2 · 2025-05-06T20:08:30Z

I also reran the benchmark with additional flags in the build: --no-default-features --features more, and we observe an astonishing 41x improvement in memory usage for the 100MB file 😂.

Do you have the results of re-running the benchmark with your latest changes?

aaron-ang · 2025-05-06T20:11:32Z

I also reran the benchmark with additional flags in the build: --no-default-features --features more, and we observe an astonishing 41x improvement in memory usage for the 100MB file 😂.

Do you have the results of re-running the benchmark with your latest changes?

yes, I've updated it in the previous comment here: #7765 (comment)

RenjiSann · 2025-05-09T14:17:40Z

I've just finished reviewing. I did not see any code smell, it's very readable imo and

However, this is a big refactor, and I have a pretty low confidence level in my own competence to review this.
@sylvestre can you take a look at it (again) ?

tertsdiepraam

Most of this looks really good. Thanks! You've crammed a lot into this PR (and basically 1 commit), which makes it hard to review. I've got some comments, but we can probably merge this after those have been fixed. Next time, please split it up over multiple PRs.

tertsdiepraam · 2025-05-09T14:23:56Z

src/uu/more/src/more.rs

+            (Some(n), _) | (None, Some(n)) if n > 0 => Some(n + 1),
+            _ => None, // Use terminal height


Next time, please separate cleanup from actual changes. That'll make it much easier to review.

tertsdiepraam · 2025-05-09T14:27:17Z

src/uu/more/src/more.rs

-    print!("\r");
-    stdout.flush().unwrap();
+#[cfg(test)]
+impl Deref for OutputType {


This seems to involved for deref, I'd rather have a unwrap_test method or something more explicit like that. Only implementing this for cfg(test) also makes me a bit worried that the behaviour might change inside and outside of tests.

I've moved this into mod tests so it should not change the original implementation. Besides, the unreachable macro should catch errors if not used appropriately.

tertsdiepraam · 2025-05-09T14:28:40Z

src/uu/more/src/more.rs

+    }
 }

-#[cfg(target_os = "fuchsia")]


Did you remove this intentionally? Shouldn't we keep it?

You're right, I've added it back.

tertsdiepraam · 2025-05-09T14:29:35Z

src/uu/more/src/more.rs

+    lines: Vec<String>,              // Lines read from the input
+    cumulative_line_sizes: Vec<u64>, // Cumulative byte sizes of lines
+    upper_mark: usize,               // Current line at the top of the screen
+    content_rows: usize,             // Number of rows that fit on the screen
+    eof_reached: bool,
+    lines_squeezed: usize, // Number of lines squeezed out in the current view


These comments should probably be docstrings.

tertsdiepraam · 2025-05-09T14:32:39Z

src/uu/more/src/more.rs

+                    reset_term()?;
+                    std::process::exit(0);


This is for another PR, but it might be nice to never exit explicitly, so that we always reset via the drop guard.

tertsdiepraam · 2025-05-09T14:32:57Z

src/uu/more/src/more.rs

+    /// Process user input events until exit
+    fn process_events(&mut self, options: &Options) -> UResult<()> {
+        loop {
+            if !event::poll(Duration::from_millis(100))? {


Why did you change the number of millis?

The original 10ms polling rate seems excessive for manual navigation. Wdyt?

tertsdiepraam · 2025-05-09T14:38:59Z

Cargo.toml

 time = { version = "0.3.36" }
 unicode-segmentation = "1.11.0"
 unicode-width = "0.2.0"
-utf-8 = "0.7.6"


Was this already unused? I can't find where you removed the usage of it.

When I added the tempfile crate into uumore's Cargo file using cargo add, it automatically removed the utf-8 crate (along with selinux-sys) from the main Cargo file. I assumed it was cleaning up unused crates, but I can add it back if needed.

tertsdiepraam · 2025-05-09T14:40:11Z

src/uu/more/src/more.rs

+        // Ensure we have enough lines loaded for display
+        self.read_until_line(self.upper_mark + self.content_rows)?;


Does this take the squeezed lines into account?

No, it does not. That is a good point. I will move this into the loop

tertsdiepraam · 2025-05-09T14:40:58Z

src/uu/more/src/more.rs

+        // Display lines until we've filled the screen
+        let mut lines_printed = 0;
+        let mut index = self.upper_mark;
+        while lines_printed < self.content_rows && index < self.lines.len() {


This looks like it could be a for loop over the index.

The main loop condition is lines_printed < self.content_rows. We may not always read until self.lines.len(). The index variable is used after the loop to check for EOF.

I've refactored the loop; hope it is clearer now.

github-actions · 2025-05-18T19:54:58Z

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

aaron-ang force-pushed the more-mem-constant branch from 442ddb7 to 7649214 Compare April 16, 2025 05:01

aaron-ang force-pushed the more-mem-constant branch from 4f212a6 to a981d4c Compare April 17, 2025 05:03

sylvestre reviewed Apr 17, 2025

View reviewed changes

aaron-ang force-pushed the more-mem-constant branch from 61ddf17 to 148d6a2 Compare April 20, 2025 21:02

jtracey mentioned this pull request Apr 20, 2025

tee: GNU test failing in CI #7805

Closed

aaron-ang force-pushed the more-mem-constant branch from 148d6a2 to 310d807 Compare April 21, 2025 08:46

aaron-ang force-pushed the more-mem-constant branch from 310d807 to 686f4b3 Compare April 21, 2025 09:21

aaron-ang marked this pull request as ready for review April 22, 2025 00:40

aaron-ang force-pushed the more-mem-constant branch from 347c086 to 022ed0f Compare April 22, 2025 01:13

aaron-ang force-pushed the more-mem-constant branch from 022ed0f to f138ac5 Compare April 22, 2025 01:14

aaron-ang commented Apr 22, 2025

View reviewed changes

aaron-ang requested a review from sylvestre April 22, 2025 03:44

RenjiSann reviewed May 2, 2025

View reviewed changes

src/uu/more/Cargo.toml Outdated Show resolved Hide resolved

aaron-ang force-pushed the more-mem-constant branch from dd0da28 to 81ca89a Compare May 3, 2025 20:19

sylvestre reviewed May 4, 2025

View reviewed changes

src/uu/more/Cargo.toml Outdated Show resolved Hide resolved

aaron-ang force-pushed the more-mem-constant branch from 81ca89a to 3c91ece Compare May 4, 2025 09:12

aaron-ang force-pushed the more-mem-constant branch from 3c91ece to 37a94e1 Compare May 4, 2025 19:27

aaron-ang force-pushed the more-mem-constant branch from 37a94e1 to bfcfb04 Compare May 5, 2025 00:11

aaron-ang force-pushed the more-mem-constant branch 3 times, most recently from a9d4a8f to d4bc6b3 Compare May 6, 2025 00:40

aaron-ang commented May 6, 2025

View reviewed changes

aaron-ang requested a review from RenjiSann May 8, 2025 19:40

tertsdiepraam reviewed May 9, 2025

View reviewed changes

aaron-ang force-pushed the more-mem-constant branch from d4bc6b3 to 7f5e2b9 Compare May 13, 2025 00:09

aaron-ang added 2 commits May 18, 2025 21:09

more: constant mem initialization for files and pipes

7692b93

test_more: use at_and_ucmd helper macro

c8c4f52

sylvestre force-pushed the more-mem-constant branch from 7f5e2b9 to c8c4f52 Compare May 18, 2025 19:09

sylvestre requested a review from tertsdiepraam May 18, 2025 19:35

sylvestre merged commit 9fbb6ac into uutils:main May 19, 2025
65 of 70 checks passed

BrewTestBot mentioned this pull request May 24, 2025

uutils-coreutils 0.1.0 Homebrew/homebrew-core#224645

Merged

moonfruit mentioned this pull request May 26, 2025

uutils-selected 0.1.0 moonfruit/homebrew-tap#243

Closed

		(Some(n), _) \| (None, Some(n)) if n > 0 => Some(n + 1),
		_ => None, // Use terminal height

		// Ensure we have enough lines loaded for display
		self.read_until_line(self.upper_mark + self.content_rows)?;

Uh oh!

Conversation

aaron-ang commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aaron-ang commented Apr 16, 2025

Uh oh!

github-actions bot commented Apr 16, 2025

Uh oh!

github-actions bot commented Apr 16, 2025

Uh oh!

github-actions bot commented Apr 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aaron-ang Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 17, 2025

Uh oh!

aaron-ang commented Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 20, 2025

Uh oh!

aaron-ang commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 21, 2025

Uh oh!

github-actions bot commented Apr 21, 2025

Uh oh!

github-actions bot commented Apr 21, 2025

Uh oh!

github-actions bot commented Apr 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 22, 2025

Uh oh!

aaron-ang commented Apr 24, 2025

Uh oh!

Uh oh!

github-actions bot commented May 3, 2025

Uh oh!

RenjiSann commented May 3, 2025

Uh oh!

aaron-ang commented May 3, 2025

Uh oh!

github-actions bot commented May 3, 2025

Uh oh!

Uh oh!

github-actions bot commented May 4, 2025

Uh oh!

github-actions bot commented May 4, 2025

Uh oh!

aaron-ang commented May 4, 2025

Uh oh!

github-actions bot commented May 5, 2025

Uh oh!

aaron-ang commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

hz2 commented May 6, 2025

Uh oh!

aaron-ang commented May 6, 2025

Uh oh!

RenjiSann commented May 9, 2025

Uh oh!

tertsdiepraam left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aaron-ang commented Apr 16, 2025 •

edited

Loading

aaron-ang Apr 17, 2025 •

edited

Loading

aaron-ang commented Apr 20, 2025 •

edited

Loading

aaron-ang commented Apr 21, 2025 •

edited

Loading

aaron-ang commented May 6, 2025 •

edited

Loading

tertsdiepraam left a comment •

edited

Loading

aaron-ang May 13, 2025 •

edited

Loading

aaron-ang May 12, 2025 •

edited

Loading

aaron-ang May 12, 2025 •

edited

Loading

aaron-ang May 12, 2025 •

edited

Loading