Skip to content

Use segment_position.bytesize in consumer deliver loop#1786

Open
walro wants to merge 1 commit intomainfrom
use-segment-position-bytesize
Open

Use segment_position.bytesize in consumer deliver loop#1786
walro wants to merge 1 commit intomainfrom
use-segment-position-bytesize

Conversation

@walro
Copy link
Contributor

@walro walro commented Mar 3, 2026

WHAT is this pull request doing?

Use the pre-computed bytesize stored on SegmentPosition instead of calling BytesMessage#bytesize which recomputes the value each time by summing timestamp, exchange_name, routing_key, properties, and bodysize.

SegmentPosition.bytesize  is set once at creation via SegmentPosition.make(segment, position, msg.bytesize.to_u32) so the value is identical.

(Noted while doing #1783)

HOW can this pull request be tested?

Tests. Doesn't seem to make much of a difference in benchmarks though, at least not with 1 producer and 1 consumer, guessing we are "producer-bound" or this simple is a drop in the ocean?

Use the pre-computed bytesize stored on SegmentPosition instead of
calling BytesMessage#bytesize which recomputes the value each time by
summing timestamp, exchange_name, routing_key, properties, and bodysize.

SegmentPosition.bytesize is set once at creation via
SegmentPosition.make(segment, position, msg.bytesize.to_u32) so the
value is identical.
@claude
Copy link

claude bot commented Mar 3, 2026

No issues found.

The change replaces env.message.bytesize (a method that computes size by summing multiple fields) with env.segment_position.bytesize (a pre-computed UInt32 getter on a struct). This is a valid hot-path optimization — SegmentPosition is always constructed with msg.bytesize.to_u32, so the values are equivalent, and the struct field access avoids recomputation and potential cache misses from accessing message fields.

@walro walro marked this pull request as ready for review March 3, 2026 16:24
@walro walro requested a review from a team as a code owner March 3, 2026 16:24
@walro
Copy link
Contributor Author

walro commented Mar 3, 2026

I was a bit skeptical about the change, so I asked ChatGPT 5.3 to take a deep look at it, and then challenged Claude:

On line 781, env.message.properties.headers = headers mutates the message's properties by adding an x-delivery-count header. This happens in get (line 734) before the envelope is yielded to consume_get, which means by the time we reach our line 84, env.message.bytesize can be larger than env.segment_position.bytesize due to the added header.

That said, the delivered_bytes counter is only used to decide when to Fiber.yield (lines 87-90) — it's a rough throughput throttle, not an accounting metric. So slight undercounting is harmless in practice; it just means the fiber yields slightly less often for messages with delivery-count headers.

I think I agree with the conclusion here, but let me know what you think. But should the segment's bytesize be updated? I am too unfamiliar with the code to tell. (Ok 🤖 says it should not matter as we don't write them back)

Also BCC header is removed, so that's yet another change. But I still argue it could be "worth it"?

@viktorerlingsson
Copy link
Member

It's only used for deciding when to yield, so I think it's up to us to decide what's correct!

But I still argue it could be "worth it"

Did a quick benchmark and it saves like 0.6ns per run in my test. So it only needs to run like 1,6 million times to save a whole ms! 😅 With that said, it's still an optimization so I think we should use it. "Many small creeks"!

@walro
Copy link
Contributor Author

walro commented Mar 4, 2026

hehe yeah, saw no meaningful difference in benchmarks either, who knew computers were fast at adding some integers together!

@snichme
Copy link
Member

snichme commented Mar 10, 2026

What I was thinking with this change and based on the comment " the delivered_bytes counter is only used to decide when to Fiber.yield" that delivered_bytes might be used for more things in the future, like exposed to prometheus metrics or in the UI when listing consumers. The risk is that one will just use the value since it's there and not look into what the number actually means and it will be wrong. A consumer can live for a long time and bytes can add up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants