feat(server): aligned buffer memory pool by tungtose · Pull Request #2921 · apache/iggy

tungtose · 2026-03-12T13:13:55Z

Prepare the memory pool and buffer infrastructure for O_DIRECT I/O. Direct I/O requires buffers to be aligned to the underlying block size (commonly 4096 bytes). This allows the kernel to bypass the page cache, reducing double buffering and giving more predictable I/O latency.

Known Trade-offs:

Minimum allocation size is now 4096 bytes, meaning small utility buffers (e.g. put_u32_le, put_u64_le) now consume more memory than before
make_mutable in the HTTP path now copies buffers due to alignment incompatibility with Bytes

codecov · 2026-03-12T13:25:33Z

Codecov Report

❌ Patch coverage is 44.33962% with 59 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.04%. Comparing base (7ed29dc) to head (b606e29).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
core/common/src/alloc/buffer.rs	26.66%	33 Missing ⚠️
core/common/src/alloc/memory_pool.rs	52.38%	10 Missing ⚠️
...ore/common/src/types/message/messages_batch_mut.rs	0.00%	7 Missing ⚠️
core/partitions/src/lib.rs	0.00%	6 Missing ⚠️
...e/common/src/types/segment_storage/index_reader.rs	90.00%	0 Missing and 1 partial ⚠️
...ommon/src/types/segment_storage/messages_reader.rs	90.00%	0 Missing and 1 partial ⚠️
core/partitions/src/iggy_partition.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #2921      +/-   ##
============================================
- Coverage     72.09%   72.04%   -0.06%     
  Complexity      930      930              
============================================
  Files          1124     1124              
  Lines         93832    93856      +24     
  Branches      71181    71213      +32     
============================================
- Hits          67649    67616      -33     
- Misses        23612    23646      +34     
- Partials       2571     2594      +23

Flag	Coverage Δ
csharp	`67.43% <ø> (-0.19%)`	⬇️
go	`38.68% <ø> (ø)`
java	`62.08% <ø> (ø)`
node	`91.28% <ø> (-0.26%)`	⬇️
python	`81.43% <ø> (ø)`
rust	`72.75% <44.33%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
.../binary/handlers/messages/send_messages_handler.rs	`100.00% <ø> (ø)`
core/server/src/http/messages.rs	`85.00% <100.00%> (+0.38%)`	⬆️
...e/common/src/types/segment_storage/index_reader.rs	`72.81% <90.00%> (-0.43%)`	⬇️
...ommon/src/types/segment_storage/messages_reader.rs	`81.94% <90.00%> (+4.72%)`	⬆️
core/partitions/src/iggy_partition.rs	`0.00% <0.00%> (ø)`
core/partitions/src/lib.rs	`0.00% <0.00%> (ø)`
...ore/common/src/types/message/messages_batch_mut.rs	`49.89% <0.00%> (-0.31%)`	⬇️
core/common/src/alloc/memory_pool.rs	`70.00% <52.38%> (-0.87%)`	⬇️
core/common/src/alloc/buffer.rs	`67.40% <26.66%> (-12.00%)`	⬇️

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

core/common/src/alloc/memory_pool.rs

hubcio · 2026-03-15T12:14:14Z

make_mutable in the HTTP path now copies buffers due to alignment incompatibility with Bytes

can you elaborate?

tungtose · 2026-03-15T15:10:35Z

make_mutable in the HTTP path now copies buffers due to alignment incompatibility with Bytes

can you elaborate?

Correct me if I'm wrong, to make the make_mutable function in the HTTP path zero copy, we need to change IggyMessagesBatch to store PooledBuffer instead of Bytes. However, IggyMessagesBatch is a core type with dependencies throughout the codebase, so this could be refactored in a separate PR if needed

Other than that, make_mutable is not in the hot path. If user is using the HTTP API for high-throughput message sending, they already have bigger problems than this one copy

tungtose · 2026-03-16T18:27:43Z

Since this PR deeply affect to the core, could you also please take a look @numinnex @spetz

hubcio · 2026-03-18T09:44:02Z

@tungtose did you check the performance? is there any difference?

core/partitions/src/lib.rs

tungtose · 2026-03-18T15:35:24Z

@tungtose did you check the performance? is there any difference?

@hubcio here is the bench result running run-benches.sh:

on main:

2026-03-18T13:40:40.123993Z INFO bench_report::prints: Producers Results: Total throughput: 3854.07 MB/s, 3854076 messages/s, average throughput per Producer: 481.76 MB/s, p50 latency: 1.30 ms, p90 latency: 2.75 ms, p95 latency: 7.14 ms, p99 latency: 18.61 ms, p999 latency: 44.92 ms, p9999 latency: 91.50 ms, average latency: 2.12 ms, median latency: 1.30 ms, min: 0.51 ms, max: 209.82 ms, std dev: 2.34 ms, total time: 2.50 s

Running iggy-bench pinned-consumer tcp...
Poll results:
2026-03-18T13:40:44.577717Z INFO bench_report::prints: Consumers Results: Total throughput: 2748.06 MB/s, 2748056 messages/s, average throughput per Consumer: 343.51 MB/s, p50 latency: 2.73 ms, p90 latency: 4.15 ms, p95 latency: 4.62 ms, p99 latency: 5.60 ms, p999 latency: 6.58 ms, p9999 latency: 7.35 ms, average latency: 2.84 ms, median latency: 2.73 ms, min: 1.22 ms, max: 8.01 ms, std dev: 0.44 ms, total time: 3.17 s

on this branch:

Send results:
2026-03-18T11:49:46.897632Z INFO bench_report::prints: Producers Results: Total throughput: 4115.45 MB/s, 4115452 messages/s, average throughput per Producer: 514.43 MB/s, p50 latency: 1.18 ms, p90 latency: 2.44 ms, p95 latency: 3.30 ms, p99 latency: 18.39 ms, p999 latency: 61.13 ms, p9999 latency: 115.81 ms, average latency: 1.94 ms, median latency: 1.18 ms, min: 0.38 ms, max: 213.28 ms, std dev: 2.38 ms, total time: 2.22 s

Running iggy-bench pinned-consumer tcp...
Poll results:
2026-03-18T11:49:51.299828Z INFO bench_report::prints: Consumers Results: Total throughput: 2920.55 MB/s, 2920547 messages/s, average throughput per Consumer: 365.07 MB/s, p50 latency: 2.54 ms, p90 latency: 4.05 ms, p95 latency: 4.44 ms, p99 latency: 5.72 ms, p999 latency: 9.07 ms, p9999 latency: 11.36 ms, average latency: 2.68 ms, median latency: 2.54 ms, min: 0.73 ms, max: 19.75 ms, std dev: 0.66 ms, total time: 3.10 s

hubcio · 2026-03-19T10:21:29Z

could you please run benchmarks with rate limit, 4 producers / consumers, total rate limit equal to 500MB? this way we'll see the p50. run it for like 30s or so

tungtose · 2026-03-20T10:15:10Z

could you please run benchmarks with rate limit, 4 producers / consumers, total rate limit equal to 500MB? this way we'll see the p50. run it for like 30s or so

@hubcio here is the bench result with this bench cmd: target/release/iggy-bench --rate-limit 500MB --warmup-time 3s --total-data 10GB pinned-producer-and-consumer --producers 4 --consumers 4 tcp

on this branch:

\x1b[34mBenchmark: Pinned Producer And Consumer, 4 producers, 4 consumers, 4 streams, 1 topic per stream, 1 partitions per topic, 20000000 messages, 1000 messages per batch, 20000 message batches, 1000 bytes per message, 20GB of data processed
\x1b[0m
2026-03-20T10:05:19.783194Z INFO bench_report::prints: \x1b[32mProducers Results: Total throughput: 249.96 MB/s, 249960 messages/s, average throughput per Producer: 62.49 MB/s, p50 latency: 1.14 ms, p90 latency: 1.99 ms, p95 latency: 2.30 ms, p99 latency: 2.97 ms, p999 latency: 6.18 ms, p9999 latency: 13.36 ms, average latency: 1.30 ms, median latency: 1.14 ms, min: 0.51 ms, max: 13.47 ms, std dev: 0.26 ms, total time: 39.98 s\x1b[0m
2026-03-20T10:05:19.783203Z INFO bench_report::prints: \x1b[32mConsumers Results: Total throughput: 251.13 MB/s, 251131 messages/s, average throughput per Consumer: 62.78 MB/s, p50 latency: 1.79 ms, p90 latency: 17.87 ms, p95 latency: 76.84 ms, p99 latency: 147.42 ms, p999 latency: 168.97 ms, p9999 latency: 171.45 ms, average latency: 10.04 ms, median latency: 1.79 ms, min: 0.83 ms, max: 286.22 ms, std dev: 22.96 ms, total time: 39.92 s\x1b[0m
2026-03-20T10:05:19.783213Z INFO bench_report::prints: \x1b[31mAggregate Results: Total throughput: 501.09 MB/s, 501091 messages/s, average throughput per Actor: 62.64 MB/s, p50 latency: 1.46 ms, p90 latency: 9.93 ms, p95 latency: 39.57 ms, p99 latency: 75.19 ms, p999 latency: 87.57 ms, p9999 latency: 92.40 ms, average latency: 5.67 ms, median latency: 1.46 ms, min: 0.51 ms, max: 286.22 ms, std dev: 14.20 ms, total time: 39.98 s\x1b[0m

on master branch:

\x1b[34mBenchmark: Pinned Producer And Consumer, 4 producers, 4 consumers, 4 streams, 1 topic per stream, 1 partitions per topic, 20000000 messages, 1000 messages per batch, 20000 message batches, 1000 bytes per message, 20GB of data processed \x1b[0m 2026-03-20T09:50:53.541140Z INFO bench_report::prints: \x1b[32mProducers Results: Total throughput: 249.96 MB/s, 249962 messages/s, average throughput per Producer: 62.49 MB/s, p50 latency: 0.91 ms, p90 latency: 1.53 ms, p95 latency: 1.85 ms, p99 latency: 2.89 ms, p999 latency: 11.57 ms, p9999 latency: 12.51 ms, average latency: 1.06 ms, median latency: 0.91 ms, min: 0.42 ms, max: 13.61 ms, std dev: 0.25 ms, total time: 39.98 s\x1b[0m 2026-03-20T09:50:53.541151Z INFO bench_report::prints: \x1b[32mConsumers Results: Total throughput: 252.47 MB/s, 252466 messages/s, average throughput per Consumer: 63.12 MB/s, p50 latency: 1.31 ms, p90 latency: 152.23 ms, p95 latency: 247.42 ms, p99 latency: 340.48 ms, p999 latency: 362.19 ms, p9999 latency: 364.22 ms, average latency: 34.25 ms, median latency: 1.31 ms, min: 0.70 ms, max: 526.35 ms, std dev: 64.63 ms, total time: 39.82 s\x1b[0m 2026-03-20T09:50:53.541162Z INFO bench_report::prints: \x1b[31mAggregate Results: Total throughput: 502.43 MB/s, 502428 messages/s, average throughput per Actor: 62.80 MB/s, p50 latency: 1.11 ms, p90 latency: 76.88 ms, p95 latency: 124.63 ms, p99 latency: 171.68 ms, p999 latency: 186.88 ms, p9999 latency: 188.36 ms, average latency: 17.66 ms, median latency: 1.11 ms, min: 0.42 ms, max: 526.35 ms, std dev: 38.38 ms, total time: 39.98 s\x1b[0m

hubcio · 2026-03-20T10:46:08Z

so the results for producers:

Metric	Master	PR	Delta
p50	0.91 ms	1.14 ms	+25%
p90	1.53 ms	1.99 ms	+30%
p99	2.89 ms	2.97 ms	+3%

consumers:

Metric	Master	PR	Delta
p50	1.31 ms	1.79 ms	+37%
p90	152.23 ms	17.87 ms	-88%
p99	340.48 ms	147.42 ms	-57%
max	526.35 ms	286.22 ms	-46%
std dev	64.63 ms	22.96 ms	-64%

p50 regressed ~25-37%. any clue why? but tail latencies (p90+) improved 57-88%, especially on consumer side.

tungtose · 2026-03-20T14:25:58Z

@hubcio Here is a benchmark update using the command line below. I believe the slow performance come from the freeze() function (converting from AVec back to Byte), the current implementation is temporary. It will be improved in an upcoming PR that integrates DirectIOFile and a proper implementation of the freeze() function
sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
target/release/iggy-bench --rate-limit 500MB --warmup-time 3s --total-data 10GB pinned-producer-and-consumer --producers 4 --consumers 4 tcp

master:

bench_report::prints: \x1b[32mProducers Results: Total throughput: 249.96 MB/s, 249964 messages/s, average throughput per Producer: 62.49 MB/s, p50 latency: 1.02 ms, p90 latency: 1.81 ms, p95 latency: 2.02 ms, p99 latency: 2.68 ms, p999 latency: 8.90 ms, p9999 latency: 11.91 ms, average latency: 1.17 ms, median latency: 1.02 ms, min: 0.46 ms, max: 13.07 ms, std dev: 0.13 ms, total time: 39.98 s\x1b[0m 2026-03-20T12:14:11.068760Z INFO bench_report::prints: \x1b[32mConsumers Results: Total throughput: 249.97 MB/s, 249971 messages/s, average throughput per Consumer: 62.49 MB/s, p50 latency: 1.47 ms, p90 latency: 2.31 ms, p95 latency: 2.58 ms, p99 latency: 3.34 ms, p999 latency: 10.22 ms, p9999 latency: 13.16 ms, average latency: 1.62 ms, median latency: 1.47 ms, min: 0.67 ms, max: 14.89 ms, std dev: 0.43 ms, total time: 40.08 s\x1b[0m

PR:

x1b[32mProducers Results: Total throughput: 249.97 MB/s, 249968 messages/s, average throughput per Producer: 62.49 MB/s, p50 latency: 0.90 ms, p90 latency: 1.39 ms, p95 latency: 1.74 ms, p99 latency: 2.39 ms, p999 latency: 9.49 ms, p9999 latency: 16.87 ms, average latency: 1.02 ms, median latency: 0.90 ms, min: 0.43 ms, max: 11.01 ms, std dev: 0.13 ms, total time: 39.98 s\x1b[0m 2026-03-20T13:15:34.355418Z INFO bench_report::prints: \x1b[32mConsumers Results: Total throughput: 249.92 MB/s, 249916 messages/s, average throughput per Consumer: 62.48 MB/s, p50 latency: 1.27 ms, p90 latency: 1.97 ms, p95 latency: 2.34 ms, p99 latency: 3.16 ms, p999 latency: 10.42 ms, p9999 latency: 17.44 ms, average latency: 1.42 ms, median latency: 1.27 ms, min: 0.65 ms, max: 20.51 ms, std dev: 0.30 ms, total time: 40.04 s\x1b[0m 2026-03-20T13:15:34.355422Z INFO bench_report::prints: \x1b[31mAggregate Results: Total throughput: 499.89 MB/s, 499885 messages/s, average throughput per Actor: 62.49 MB/s, p50 latency: 1.08 ms, p90 latency: 1.68 ms, p95 latency: 2.04 ms, p99 latency: 2.77 ms, p999 latency: 9.95 ms, p9999 latency: 17.16 ms, average latency: 1.22 ms, median latency: 1.08 ms, min: 0.43 ms, max: 20.51 ms, std dev: 0.14 ms, total time: 40.04 s\x1b[0m

Producers

Metric	Master	PR	delta
p50 latency	1.02 ms	0.90 ms	-11.8%
p90 latency	1.81 ms	1.39 ms	-23.2%
p95 latency	2.02 ms	1.74 ms	-13.9%
p99 latency	2.68 ms	2.39 ms	-10.8%
p999 latency	8.90 ms	9.49 ms	+6.6%
p9999 latency	11.91 ms	16.87 ms	+41.6%
Std dev	0.13 ms	0.13 ms	0%

Consumers

Metric	Master	PR	delta
p50 latency	1.47 ms	1.27 ms	-13.6%
p90 latency	2.31 ms	1.97 ms	-14.7%
p95 latency	2.58 ms	2.34 ms	-9.3%
p99 latency	3.34 ms	3.16 ms	-5.4%
p999 latency	10.22 ms	10.42 ms	+2.0%
p9999 latency	13.16 ms	17.44 ms	+32.5%
Std dev	0.43 ms	0.30 ms	-30.2%

hubcio · 2026-03-24T09:40:51Z

@numinnex what are we doing with this one?
EDIT: we decided to merge this now. @tungtose this code will be moved to iobuf crate, see #3020

tungtose added 7 commits March 13, 2026 21:30

in middle of direct io

eb0f35d

fix tests & clean up debugs

a64f691

clean up

7dcae28

clean up

737fca8

clean up

321c221

manual run lint

8e69079

zero copy freeze

f55e808

tungtose force-pushed the aligned-buffer-memory-pool branch from d83f2f6 to f55e808 Compare March 13, 2026 14:50

remove annoy logs

cd3560e

tungtose force-pushed the aligned-buffer-memory-pool branch from 260a3c2 to cd3560e Compare March 13, 2026 15:23

tungtose and others added 2 commits March 13, 2026 22:41

noop

043a3a7

Merge branch 'master' into aligned-buffer-memory-pool

a0649cc

hubcio reviewed Mar 15, 2026

View reviewed changes

core/common/src/alloc/memory_pool.rs Show resolved Hide resolved

Merge branch 'master' into aligned-buffer-memory-pool

e7a15e3

tungtose and others added 5 commits March 16, 2026 20:00

Merge branch 'master' into aligned-buffer-memory-pool

6fbc107

noop

52a2a54

Merge branch 'master' into aligned-buffer-memory-pool

5486aae

Merge branch 'master' into aligned-buffer-memory-pool

8c1786f

update lock file

56fc8ec

tungtose force-pushed the aligned-buffer-memory-pool branch from 2d54146 to 56fc8ec Compare March 16, 2026 17:59

Merge branch 'master' into aligned-buffer-memory-pool

cbb7286

hubcio mentioned this pull request Mar 17, 2026

feat(server): implement TwoHalves buffer #2944

Merged

tungtose and others added 2 commits March 18, 2026 15:39

Merge branch 'master' into aligned-buffer-memory-pool

ca7eab5

adapt master code

b4338ca

numinnex reviewed Mar 18, 2026

View reviewed changes

core/partitions/src/lib.rs Outdated Show resolved Hide resolved

tungtose and others added 2 commits March 18, 2026 22:49

Merge branch 'master' into aligned-buffer-memory-pool

91ca5f2

resolve comment

6b9c110

Merge branch 'master' into aligned-buffer-memory-pool

905ca9e

fix branching

1a19a7d

tungtose force-pushed the aligned-buffer-memory-pool branch from ece90bf to 1a19a7d Compare March 20, 2026 13:23

Merge branch 'master' into aligned-buffer-memory-pool

5a5789f

tungtose added 3 commits March 22, 2026 16:21

Merge branch 'master' into aligned-buffer-memory-pool

6248760

Merge branch 'master' into aligned-buffer-memory-pool

d3b50e9

Merge branch 'master' into aligned-buffer-memory-pool

b606e29

numinnex approved these changes Mar 24, 2026

View reviewed changes

hubcio approved these changes Mar 24, 2026

View reviewed changes

hubcio merged commit a81bcc6 into apache:master Mar 24, 2026
79 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): aligned buffer memory pool#2921

feat(server): aligned buffer memory pool#2921
hubcio merged 27 commits intoapache:masterfrom
tungtose:aligned-buffer-memory-pool

tungtose commented Mar 12, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

hubcio commented Mar 15, 2026 •

edited

Loading

Uh oh!

tungtose commented Mar 15, 2026

Uh oh!

tungtose commented Mar 16, 2026

Uh oh!

hubcio commented Mar 18, 2026

Uh oh!

Uh oh!

tungtose commented Mar 18, 2026

Uh oh!

hubcio commented Mar 19, 2026 •

edited

Loading

Uh oh!

tungtose commented Mar 20, 2026

Uh oh!

hubcio commented Mar 20, 2026

Uh oh!

tungtose commented Mar 20, 2026 •

edited

Loading

Uh oh!

hubcio commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tungtose commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

hubcio commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tungtose commented Mar 15, 2026

Uh oh!

tungtose commented Mar 16, 2026

Uh oh!

hubcio commented Mar 18, 2026

Uh oh!

Uh oh!

tungtose commented Mar 18, 2026

on main:

on this branch:

Uh oh!

hubcio commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tungtose commented Mar 20, 2026

on this branch:

on master branch:

Uh oh!

hubcio commented Mar 20, 2026

Uh oh!

tungtose commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

master:

PR:

Producers

Consumers

Uh oh!

hubcio commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tungtose commented Mar 12, 2026 •

edited

Loading

codecov bot commented Mar 12, 2026 •

edited

Loading

hubcio commented Mar 15, 2026 •

edited

Loading

hubcio commented Mar 19, 2026 •

edited

Loading

tungtose commented Mar 20, 2026 •

edited

Loading

hubcio commented Mar 24, 2026 •

edited

Loading