Skip to content

I/O: .lines() iterator is slower than a manual loop over .read_line() due to allocations #68

@Shnatsel

Description

@Shnatsel

BufRead::lines() is a convenient way to read line-by-line. But since it implements the Iterator trait which allows .collect()ing the elements, it allocates every line on the heap separately.

A manual loop over BufRead::read_line() is significantly faster because you can reuse a single buffer for all lines, although it is more verbose.

Code using .lines():

    for line in reader.lines() {
        // TODO: process the line
    }

Manual read_line loop reusing the same buffer:

    let mut line = String::new();
    while reader.read_line(&mut line)? != 0 { // 0 bytes read is how the OS indicates that we reached end of file
        // TODO: process the line
        line.clear(); // clear the buffer after each line, or we'll end up with the whole file in memory!
    }

Benchmark results on this public domain book in plain text format repeated 50 times:

Benchmark 1: target/release/lines ~/repeated_book
  Time (mean ± σ):     132.0 ms ±  10.8 ms    [User: 122.9 ms, System: 9.0 ms]
  Range (min … max):   118.6 ms … 161.0 ms    100 runs
 
Benchmark 2: target/release/read_line ~/repeated_book
  Time (mean ± σ):      97.2 ms ±  10.4 ms    [User: 87.9 ms, System: 9.2 ms]
  Range (min … max):    90.5 ms … 131.8 ms    100 runs
 
Summary
  'target/release/read_line ~/repeated_book' ran
    1.36 ± 0.18 times faster than 'target/release/lines ~/repeated_book'

Exact code used for benchmarks:
https://github.com/Shnatsel/fast-io-cookbook/blob/82ae9bd106002c4d7ffeb6f263b9ae8c05ffe95e/src/bin/lines.rs
https://github.com/Shnatsel/fast-io-cookbook/blob/82ae9bd106002c4d7ffeb6f263b9ae8c05ffe95e/src/bin/read_line.rs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions