perf: reduce allocations through memoization #3219
Conversation
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @tac0turtle's task in 2m 45s —— View job Code Review: perf: reduce allocations through memoization
Overall the PR is a solid performance improvement. The memoization strategy is well-reasoned and the invalidation paths (Clone, FromProto, ApplyLegacyDefaults) are correctly identified and patched. One critical concern around concurrent access to Critical: Data Race on
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds memoized header hashing (cachedHash) with MemoizeHash/InvalidateHash, updates call sites to memoize before caching or comparisons, ensures clones/deserialization clear cached hash, and introduces a height→hash fast-path in the internal cache to avoid redundant overwrites. (≤50 words) Changes
Sequence Diagram(s)(Skipped) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3219 +/- ##
==========================================
+ Coverage 61.41% 61.43% +0.02%
==========================================
Files 120 120
Lines 12474 12504 +30
==========================================
+ Hits 7661 7682 +21
- Misses 3953 3960 +7
- Partials 860 862 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
types/hashing.go (1)
75-88:⚠️ Potential issue | 🟠 MajorLegacy headers are hashed twice on the cold path.
Line 76 computes
HashSlim()unconditionally, and Lines 81-85 may then compute and return the legacy hash instead. For any header with populatedLegacy, the first call now does two marshals and two SHA-256s before memoization even helps, which cuts directly against the allocation win this PR is chasing.♻️ Suggested change
func (h *Header) computeHash() Hash { - slimHash, err := h.HashSlim() - if err != nil { - return nil - } - if h.Legacy != nil && !h.Legacy.IsZero() { legacyHash, err := h.HashLegacy() if err == nil { return legacyHash } } + slimHash, err := h.HashSlim() + if err != nil { + return nil + } return slimHash }As per coding guidelines,
types/**/*.goshould "Keep types lightweight and avoid unnecessary allocations for performance optimization".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@types/hashing.go` around lines 75 - 88, Header.computeHash currently always calls HashSlim(), causing double marshals/SHA-256 for legacy headers; change the control flow to check h.Legacy (h.Legacy != nil && !h.Legacy.IsZero()) first and call HashLegacy() only in that case (returning legacyHash on success), and only call HashSlim() if Legacy is nil/zero or HashLegacy() errors; update computeHash to avoid precomputing slimHash so a header with Legacy populated only computes one hash (use HashLegacy or HashSlim as appropriate) while preserving existing error handling and memoization behavior.
🧹 Nitpick comments (2)
types/serialization_test.go (1)
359-378: Prefer table-driven structure for this new serialization-path test.The assertions are good, but this should follow the
types/**/*_test.gotable-driven test convention for consistency with the suite.As per coding guidelines
types/**/*_test.go: "Use table-driven tests for all validation paths, serialization round-trips, and edge cases in test files".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@types/serialization_test.go` around lines 359 - 378, Refactor TestHeaderFromProtoClearsCachedHash into a table-driven test: define a slice of test cases (with fields like name, initial *Header, protoFrom *Header, expectedHeight, expectedDataHash, expectCachedNil) then loop over cases with t.Run(case.name, func(t *testing.T){ t.Parallel(); case.initial.MemoizeHash(); require.NotNil(t, case.initial.cachedHash); protoMsg := case.protoFrom.ToProto(); require.NoError(t, case.initial.FromProto(protoMsg)); assert.Equal(t, case.expectCachedNil, case.initial.cachedHash == nil); assert.Equal(t, case.expectedHeight, case.initial.Height()); assert.Equal(t, case.expectedDataHash, case.initial.DataHash) }); keep the existing checks (MemoizeHash, FromProto, cachedHash nil, Height, DataHash) and reference the existing symbols Header, BaseHeader, MemoizeHash, FromProto, ToProto, Height, and DataHash when implementing the cases.types/header.go (1)
86-88: The cache now lives inside a mutable value type.Line 88 makes
cachedHashpart of every plainHeadercopy, and Line 270 already has to manually scrub it back out inClone(). Plain assignment / pass-by-value copies will still carry the old hash unless each mutation path remembers to callInvalidateHash(). Please verify the remaining copy+mutate sites, or move memoization to the store/cache boundary instead of coupling it toHeadervalue semantics.As per coding guidelines,
types/**/*.goshould "Use value semantics for small types and deep copy when modification is needed to maintain immutability".Also applies to: 262-270
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@types/header.go` around lines 86 - 88, The cachedHash field embeds mutable memoization into the Header value type (cachedHash), causing copies to carry stale hashes; remove cachedHash from the Header struct and move memoization out to the store/cache boundary instead. Concretely: delete the cachedHash field from Header, replace the in-struct MemoizeHash() / InvalidateHash() implementations to consult an external package-level or store-scoped cache (e.g., headerHashCache keyed by the header's canonical immutable fields or serialized bytes), update Clone() to stop scrubbing cachedHash, and update all callers that referenced Header.cachedHash or called InvalidateHash()/MemoizeHash() to use the new cache API (look for references to MemoizeHash, InvalidateHash, Clone, and cachedHash to modify). Ensure the new cache uses safe keys (immutable header content) and provides equivalent semantics for cache population and invalidation at the store boundary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@block/internal/syncing/p2p_handler.go`:
- Around line 103-106: The call currently uses the embedded field selector
p2pHeader.SignedHeader.MemoizeHash(); change it to call the promoted method
directly as p2pHeader.MemoizeHash() to follow idiomatic Go and satisfy
staticcheck QF1008—update the invocation in the
block/internal/syncing/p2p_handler.go code where MemoizeHash is called on the
P2PSignedHeader instance (p2pHeader) so downstream callers still get the
memoized hash.
---
Outside diff comments:
In `@types/hashing.go`:
- Around line 75-88: Header.computeHash currently always calls HashSlim(),
causing double marshals/SHA-256 for legacy headers; change the control flow to
check h.Legacy (h.Legacy != nil && !h.Legacy.IsZero()) first and call
HashLegacy() only in that case (returning legacyHash on success), and only call
HashSlim() if Legacy is nil/zero or HashLegacy() errors; update computeHash to
avoid precomputing slimHash so a header with Legacy populated only computes one
hash (use HashLegacy or HashSlim as appropriate) while preserving existing error
handling and memoization behavior.
---
Nitpick comments:
In `@types/header.go`:
- Around line 86-88: The cachedHash field embeds mutable memoization into the
Header value type (cachedHash), causing copies to carry stale hashes; remove
cachedHash from the Header struct and move memoization out to the store/cache
boundary instead. Concretely: delete the cachedHash field from Header, replace
the in-struct MemoizeHash() / InvalidateHash() implementations to consult an
external package-level or store-scoped cache (e.g., headerHashCache keyed by the
header's canonical immutable fields or serialized bytes), update Clone() to stop
scrubbing cachedHash, and update all callers that referenced Header.cachedHash
or called InvalidateHash()/MemoizeHash() to use the new cache API (look for
references to MemoizeHash, InvalidateHash, Clone, and cachedHash to modify).
Ensure the new cache uses safe keys (immutable header content) and provides
equivalent semantics for cache population and invalidation at the store
boundary.
In `@types/serialization_test.go`:
- Around line 359-378: Refactor TestHeaderFromProtoClearsCachedHash into a
table-driven test: define a slice of test cases (with fields like name, initial
*Header, protoFrom *Header, expectedHeight, expectedDataHash, expectCachedNil)
then loop over cases with t.Run(case.name, func(t *testing.T){ t.Parallel();
case.initial.MemoizeHash(); require.NotNil(t, case.initial.cachedHash); protoMsg
:= case.protoFrom.ToProto(); require.NoError(t,
case.initial.FromProto(protoMsg)); assert.Equal(t, case.expectCachedNil,
case.initial.cachedHash == nil); assert.Equal(t, case.expectedHeight,
case.initial.Height()); assert.Equal(t, case.expectedDataHash,
case.initial.DataHash) }); keep the existing checks (MemoizeHash, FromProto,
cachedHash nil, Height, DataHash) and reference the existing symbols Header,
BaseHeader, MemoizeHash, FromProto, ToProto, Height, and DataHash when
implementing the cases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2e40f39f-abdc-4e81-be57-d85369501b1f
📒 Files selected for processing (13)
block/internal/cache/generic_cache.goblock/internal/executing/executor.goblock/internal/syncing/da_retriever.goblock/internal/syncing/p2p_handler.goblock/internal/syncing/syncer.gopkg/store/cached_store.gotypes/hashing.gotypes/hashing_test.gotypes/header.gotypes/serialization.gotypes/serialization_test.gotypes/utils.gotypes/utils_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@types/hashing.go`:
- Around line 44-73: Header.Hash and Header.MemoizeHash read and write the
shared field cachedHash without synchronization, causing a data race when the
same *Header is handed out to multiple goroutines; fix this by adding a
concurrency guard (e.g., a sync.RWMutex) to the Header type and use it when
accessing cachedHash: use a read lock (RLock/RUnlock) in Hash() and computeHash
readers, and a write lock (Lock/Unlock) in MemoizeHash() when assigning
h.cachedHash (and in any invalidate method that clears cachedHash);
alternatively, stop returning the same *Header from caches and return copies,
but if keeping shared headers use the mutex and update all places that touch
cachedHash accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| // Hash returns the header hash. It reuses a memoized value if one has already | ||
| // been prepared via MemoizeHash, but it does not write to the header itself. | ||
| func (h *Header) Hash() Hash { | ||
| if h == nil { | ||
| return nil | ||
| } | ||
| if h.cachedHash != nil { | ||
| return h.cachedHash | ||
| } | ||
|
|
||
| slimHash, err := h.HashSlim() | ||
| if err != nil { | ||
| return h.computeHash() | ||
| } | ||
|
|
||
| // MemoizeHash computes the header hash and stores it on the header for future | ||
| // Hash() calls. Call this before publishing the header to shared goroutines or | ||
| // caches. | ||
| func (h *Header) MemoizeHash() Hash { | ||
| if h == nil { | ||
| return nil | ||
| } | ||
| if h.cachedHash != nil { | ||
| return h.cachedHash | ||
| } | ||
|
|
||
| hash := h.computeHash() | ||
| if hash != nil { | ||
| h.cachedHash = hash | ||
| } | ||
| return hash | ||
| } |
There was a problem hiding this comment.
Synchronize cachedHash before sharing cached headers.
Lines 50-51, 64-70, and 91-93 access cachedHash without synchronization. pkg/store/cached_store.go:87-106 and pkg/store/cached_store.go:108-130 return the same *types.SignedHeader instance to multiple callers, so one goroutine can memoize or invalidate while another reads Hash(). That is a Go data race on shared state; please guard cachedHash or stop returning shared mutable header pointers from the caches. As per coding guidelines "Be careful with concurrent access to shared state in Go".
Also applies to: 89-95
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@types/hashing.go` around lines 44 - 73, Header.Hash and Header.MemoizeHash
read and write the shared field cachedHash without synchronization, causing a
data race when the same *Header is handed out to multiple goroutines; fix this
by adding a concurrency guard (e.g., a sync.RWMutex) to the Header type and use
it when accessing cachedHash: use a read lock (RLock/RUnlock) in Hash() and
computeHash readers, and a write lock (Lock/Unlock) in MemoizeHash() when
assigning h.cachedHash (and in any invalidate method that clears cachedHash);
alternatively, stop returning the same *Header from caches and return copies,
but if keeping shared headers use the mutex and update all places that touch
cachedHash accordingly.
Overview
This pr reuses hashes and memoizes the header hash in order to reduce pressure on GC
Summary by CodeRabbit
Performance
Reliability
Tests