tx_service: guard standby term promotion with candidate check#469
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughAdds conditional standby-term promotion: new private Changes
Sequence DiagramsequenceDiagram
participant Task as RequestSyncSnapshot Task
participant Sharder as Sharder
participant CcNode as CcNode
participant KV as KVStore
Task->>Sharder: PromoteStandbyTermIfCandidate(ng_id, standby_term)
activate Sharder
Sharder->>Sharder: acquire shared cluster_cnf_mux_
Sharder->>CcNode: PromoteStandbyTermIfCandidate(standby_term)
activate CcNode
CcNode->>CcNode: acquire is_processing_ latch (spin)
CcNode->>CcNode: read CandidateStandbyNodeTerm()
alt candidate_term == standby_term
CcNode->>KV: SetStandbyNodeTerm(standby_term)
CcNode->>KV: SetCandidateStandbyNodeTerm(-1)
CcNode-->>Sharder: return true
else
CcNode-->>Sharder: return false (log mismatch)
end
CcNode->>CcNode: release latch
deactivate CcNode
Sharder-->>Task: promoted (bool)
deactivate Sharder
alt promoted == true
Task->>KV: RestoreTxCache(...)
Task->>Task: log "RequestSyncSnapshot successfully"
else
Task->>Task: log "stale term is discarded"
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tx_service/src/remote/cc_node_service.cpp`:
- Around line 2087-2094: The call to store_hd->RequestSyncSnapshot(ng_id, term,
snapshot_ts) can start stale work because it uses only
PrimaryTermFromStandbyTerm; either guard that call with the full standby_term or
thread subscribe_id through the API so the store can ignore stale requests:
modify the logic around PrimaryTermFromStandbyTerm/RequestSyncSnapshot and
PromoteStandbyTermIfCandidate so you first validate the standby_term (or pass
the subscribe_id) before invoking RequestSyncSnapshot, or change
RequestSyncSnapshot signature to accept subscribe_id and ensure the store's
ReloadData filters by subscribe_id as well; update calls and callers of
RequestSyncSnapshot and any store_hd implementations accordingly (also ensure
RestoreTxCache usage remains gated by PromoteStandbyTermIfCandidate).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eee38712-3c8a-4dfe-938e-ad7ea20727f6
📒 Files selected for processing (5)
tx_service/include/fault/cc_node.htx_service/include/sharder.htx_service/src/fault/cc_node.cpptx_service/src/remote/cc_node_service.cpptx_service/src/sharder.cpp
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tx_service/src/remote/cc_node_service.cpp (1)
2090-2092:⚠️ Potential issue | 🟠 MajorThe new timestamp gate still lets stale same-term subscriptions start store work.
The
snapshot_tsdiscard helps, but Line 2091 still calls the store with onlyterm; the fullstandby_termis checked only afterwards inPromoteStandbyTermIfCandidate(). Two follow sessions on the same primary term can still kick off store-side sync for the oldersubscribe_idbefore it is rejected. Please gate the store call on the full standby term, or threadstandby_term/subscribe_idthrough the store API.Verify whether the downstream snapshot path ever sees the full standby term or subscribe ID. If every implementation still only takes
(ng_id, term, snapshot_ts), this stale-work window is still open.#!/bin/bash set -euo pipefail echo "== RequestSyncSnapshot signatures and implementations ==" rg -n -C3 '\bRequestSyncSnapshot\s*\(' --type=cpp --type=h echo echo "== Store-side snapshot filtering inputs ==" rg -n -C3 'ReloadData\(|subscribe_id|standby_term|standby_node_term|PrimaryTermFromStandbyTerm' --type=cpp --type=h🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tx_service/src/remote/cc_node_service.cpp` around lines 2090 - 2092, The call to store_hd->RequestSyncSnapshot(ng_id, term, snapshot_ts) can start store work for a stale same-term subscriber; change the logic in the caller (around RequestSyncSnapshot and PromoteStandbyTermIfCandidate) to gate the store invocation on the full standby term/subscribe_id (e.g., compare standby_term/subscribe_id before calling RequestSyncSnapshot) or modify the store API to accept standby_term and/or subscribe_id and have the store reject/ignore stale requests; update references to RequestSyncSnapshot, PromoteStandbyTermIfCandidate, store_hd, standby_term, subscribe_id, snapshot_ts, ng_id and term accordingly and run the supplied ripgrep checks to verify all RequestSyncSnapshot implementations and downstream snapshot paths accept and enforce the full standby term/subscribe_id.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tx_service/src/remote/cc_node_service.cpp`:
- Around line 2090-2092: The call to store_hd->RequestSyncSnapshot(ng_id, term,
snapshot_ts) can start store work for a stale same-term subscriber; change the
logic in the caller (around RequestSyncSnapshot and
PromoteStandbyTermIfCandidate) to gate the store invocation on the full standby
term/subscribe_id (e.g., compare standby_term/subscribe_id before calling
RequestSyncSnapshot) or modify the store API to accept standby_term and/or
subscribe_id and have the store reject/ignore stale requests; update references
to RequestSyncSnapshot, PromoteStandbyTermIfCandidate, store_hd, standby_term,
subscribe_id, snapshot_ts, ng_id and term accordingly and run the supplied
ripgrep checks to verify all RequestSyncSnapshot implementations and downstream
snapshot paths accept and enforce the full standby term/subscribe_id.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2e3e053f-d4e1-4302-ab71-d1633a525fd4
📒 Files selected for processing (1)
tx_service/src/remote/cc_node_service.cpp
Fixes https://github.com/eloqdata/project_tracker/issues/243.
Summary by CodeRabbit