Store ng leader term with ng_leader_cache.#129
Conversation
619e59a to
445e423
Compare
9bf150c to
593ed58
Compare
593ed58 to
ae0369d
Compare
WalkthroughAdds leader term handling across the system: a term field is added to the NotifyNewLeaderStartRequest proto; Sharder gains a term-aware UpdateLeader method and a per-node-group leader-term cache; callers (CcNode and cc_node_service) pass term through; Sharder implements atomic, term-ordered updates and init for the new cache. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client/Peer
participant Svc as CCNodeService
participant Sh as Sharder
participant Caches as Leader Caches
C->>Svc: NotifyNewLeaderStart(ng_id, node_id, term)
Svc->>Sh: UpdateLeader(ng_id, node_id, term)
alt term provided (!= -1)
Sh->>Caches: read cached_term[ng_id]
opt outdated check
Note over Sh,Caches: If term <= cached_term and cached != -1, skip
end
Sh->>Caches: CAS loop: cached_term -> term
else no term
Note over Sh: Skip term update
end
Sh->>Caches: publish leader_id[ng_id] = node_id
Svc-->>C: OK/Status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
include/sharder.h (1)
765-766: Document the unreliability of the cache more clearly.The comment states "this is not reliable" but doesn't explain why or what the implications are. Consider adding more context about when and how this cache might be unreliable, and what callers should be aware of.
// Ng leader cache. We preallocate it to the max cluster size so that we // don't need to modify the size of it. std::atomic<uint32_t> ng_leader_cache_[1000]; - // Ng leader term cache. Note this is not reliable. + // Ng leader term cache. Note this is not reliable as it may contain stale + // term values during network partitions or when updates from remote nodes + // are delayed. The cache is best-effort and should not be used for + // critical leader verification - use CheckLeaderTerm() instead. std::atomic<int64_t> ng_leader_term_cache_[1000];src/sharder.cpp (1)
684-709: Consider potential ABA problem in the CAS loop.The current implementation has a theoretical ABA problem where the cached_leader_term could change from A to B and back to A between the load and CAS, potentially allowing an outdated update. While unlikely in practice, consider whether this edge case matters for your consistency requirements.
Additionally, the double-check pattern inside the loop is redundant since cached_leader_term is updated by the CAS failure.
void Sharder::UpdateLeader(uint32_t ng_id, uint32_t node_id, int64_t term) { DLOG(INFO) << "ccnode group ng" << ng_id << " updates leader to node_id:" << node_id; auto cached_leader_term = ng_leader_term_cache_[ng_id].load(std::memory_order_relaxed); // If the term is passed and out-dated, skip it. if (term != -1 && cached_leader_term != -1 && term <= cached_leader_term) { DLOG(INFO) << "skip out-dated leader update, term: " << term << ", cached term: " << cached_leader_term; return; } - while (!ng_leader_term_cache_[ng_id].compare_exchange_weak( - cached_leader_term, term)) - { - if (term != -1 && cached_leader_term != -1 && - term <= cached_leader_term) - { - DLOG(INFO) << "skip out-dated leader update, term: " << term - << ", cached term: " << cached_leader_term; - return; - } - } + // Update term cache if we have a newer term + if (term != -1) { + while (!ng_leader_term_cache_[ng_id].compare_exchange_weak( + cached_leader_term, term)) + { + // cached_leader_term is updated by CAS failure + if (cached_leader_term != -1 && term <= cached_leader_term) + { + DLOG(INFO) << "skip out-dated leader update, term: " << term + << ", cached term: " << cached_leader_term; + return; + } + } + } ng_leader_cache_[ng_id].store(node_id, std::memory_order_release); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
include/proto/cc_request.proto(1 hunks)include/sharder.h(2 hunks)src/fault/cc_node.cpp(1 hunks)src/remote/cc_node_service.cpp(1 hunks)src/sharder.cpp(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
include/sharder.h (1)
src/sharder.cpp (4)
UpdateLeader(663-682)UpdateLeader(663-663)UpdateLeader(684-710)UpdateLeader(684-684)
🔇 Additional comments (6)
include/proto/cc_request.proto (1)
44-44: LGTM!The addition of the
termfield toNotifyNewLeaderStartRequestis appropriate for tracking leader terms.include/sharder.h (1)
356-358: LGTM!The updated API signature with the optional term parameter is well-designed. The default value of -1 provides backward compatibility.
src/remote/cc_node_service.cpp (1)
219-224: LGTM!The integration correctly reads the term from the request and passes it to UpdateLeader.
src/fault/cc_node.cpp (1)
838-838: LGTM!The call site correctly passes the primary_term to UpdateLeader, maintaining consistency with the term-aware updates.
src/sharder.cpp (2)
172-172: LGTM!Proper initialization of the ng_leader_term_cache_ to -1 for all potential node groups.
698-708: No code path depends on ng_leader_term_cache_ and ng_leader_cache_ being strictly consistent.
Repo search shows only initialization and the update in src/sharder.cpp; LeaderNodeId (include/sharder.h) loads only ng_leader_cache_ and ng_leader_term_cache_ is documented "not reliable." No action required.
Check term before UpdateLeader.
Related PR:
https://github.com/eloqdata/raft_host_manager/pull/3
Summary by CodeRabbit