Range read block on rw conflict#283
Conversation
|
Warning Rate limit exceeded@githubzilla has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 10 minutes and 33 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughRange read conflict handling now blocks on lock-acquire failures (retaining bucket read locks) and invokes a throttled deadlock checker instead of aborting; several pre-abort checks were removed and fault-injection/diagnostics added in commit-acquire-all flows. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Req as ReadCc Request
participant LM as LockManager/BucketLock
participant DLD as DeadLockCheck
participant Remote as RemoteNode
Req->>LM: attempt acquire bucket/range lock
alt acquire succeeds
LM-->>Req: lock acquired
Req->>Req: proceed with read
else acquire fails
LM-->>Req: lock busy / contention
Req->>Req: mark blocked (retain bucket read lock)
Req->>DLD: try request deadlock check (throttled)
note right of DLD: uses atomic requested_check_ and last_throttled_check_time_
alt remote-origin
Req->>Remote: send ack of block
end
DLD-->>Req: (possibly) schedule deadlock analysis / no-op if throttled
LM-->>Req: on lock release -> retry queued request
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tx_service/src/tx_operation.cpp (1)
317-327: Do not blindly retryDATA_STORE_ERRinReadLocalOperation::ForwardIn
ReadLocalOperation::Forward(Lines 317–327), all errors (bothACQUIRE_KEY_LOCK_FAILED_FOR_RW_CONFLICTandDATA_STORE_ERR) now cause the handler result to be reset and the operation to be re‑processed:
- This makes sense for
ACQUIRE_KEY_LOCK_FAILED_FOR_RW_CONFLICTunder the new “blocking range read + deadlock detection” model.- For
DATA_STORE_ERR, this turns a non‑transient local data‑store failure into an unbounded immediate retry loop with no backoff and no explicit timeout handling in this op. The transaction never sees the terminal error and may just spin until some external timeout/abort kicks in.You likely want to preserve the old “propagate DATA_STORE_ERR and finish” behavior and only apply the retry logic to the RW‑conflict case.
A minimal, more robust structure would be:
- else if (hd_result_->IsError()) - { - assert(hd_result_->ErrorCode() == - CcErrorCode::ACQUIRE_KEY_LOCK_FAILED_FOR_RW_CONFLICT || - hd_result_->ErrorCode() == CcErrorCode::DATA_STORE_ERR); - // With blocking range reads and deadlock detection, we no longer - // need to preemptively abort. The request will block and deadlock - // detection will handle actual deadlocks. Retry the operation. - hd_result_->Value().Reset(); - hd_result_->Reset(); - execute_immediately_ = false; - txm->Process(*this); - return; - } + else if (hd_result_->IsError()) + { + const auto err = hd_result_->ErrorCode(); + if (err == + CcErrorCode::ACQUIRE_KEY_LOCK_FAILED_FOR_RW_CONFLICT) + { + // With blocking range reads and deadlock detection, we no + // longer need to preemptively abort. The request will block and + // deadlock detection will handle actual deadlocks. Retry the + // operation. + hd_result_->Value().Reset(); + hd_result_->Reset(); + execute_immediately_ = false; + txm->Process(*this); + return; + } + + // Non‑retryable local error: bubble it up instead of spinning. + assert(err == CcErrorCode::DATA_STORE_ERR); + txm->PostProcess(*this); + return; + }
🧹 Nitpick comments (1)
tx_service/src/tx_operation.cpp (1)
612-622: Consider bounding or backing off retries inLockWriteRangeBucketsOp::ForwardFor
LockWriteRangeBucketsOp::Forward(Lines 612–622), onACQUIRE_KEY_LOCK_FAILED_FOR_RW_CONFLICTyou now:
- Reset
lock_result_- Set
is_running_ = false; execute_immediately_ = false;- Immediately call
txm->Process(*this);This implements the new “blocking range read” semantics by retrying instead of aborting, which is consistent with the PR’s intent. However, if for any reason the CC layer keeps returning the conflict code (e.g. a bug or a permanently blocked writer), this path will repeatedly re‑issue the same lock request with no retry bound and no backoff in this op.
It might be safer to either:
- Consume
retry_num_and go throughReRunOp(txm)(with the existing exponential sleep), or- Add a small retry counter/backoff specific to this op.
Not strictly a blocker, but worth tightening before this path becomes hot.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tx_service/include/cc/range_cc_map.h(3 hunks)tx_service/include/dead_lock_check.h(2 hunks)tx_service/src/dead_lock_check.cpp(1 hunks)tx_service/src/tx_operation.cpp(7 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/src/tx_operation.cpptx_service/include/cc/range_cc_map.h
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Applied to files:
tx_service/include/cc/range_cc_map.h
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/include/cc/range_cc_map.h
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.
Applied to files:
tx_service/include/cc/range_cc_map.h
🧬 Code graph analysis (2)
tx_service/include/dead_lock_check.h (1)
tx_service/src/dead_lock_check.cpp (2)
RequestCheckWithThrottle(56-79)RequestCheckWithThrottle(56-56)
tx_service/include/cc/range_cc_map.h (1)
tx_service/src/dead_lock_check.cpp (2)
RequestCheckWithThrottle(56-79)RequestCheckWithThrottle(56-56)
🔇 Additional comments (7)
tx_service/src/tx_operation.cpp (1)
49-49: Include order change looks fineMoving
#include "tx_operation.h"(Line 49) below other headers is safe and aligns with including the corresponding header last. No functional impact.tx_service/include/cc/range_cc_map.h (4)
32-32: LGTM!The include directive is necessary for the new
DeadLockCheck::RequestCheckWithThrottlecall added below.
233-235: LGTM!The updated comments clearly explain the new blocking behavior for range reads, which is consistent with the implementation changes below.
338-343: The remote request acknowledgement pattern is correct and safe. RemoteRead properly inherits from ReadCc (confirmed in remote_cc_request.h:180), making the static_cast toremote::RemoteRead&safe when guarded by the!req.IsLocal()check. TheAcknowledge()method is properly defined and correctly notifies the remote node by setting transaction number and handler address in the output message. This pattern is consistently used throughout the codebase for distinguishing remote vs. local request handling.
328-346: Bucket lock handling is consistent with the design and safely managed.The code correctly holds the bucket read lock while the request is blocked on range lock acquisition, matching the pattern used for regular data reads. This lock is intentionally retained to avoid re-acquiring it when the request resumes.
Lock lifecycle:
- During blocking: Bucket lock is held (not released) along with BlockByLock status set
- On resumption: Request re-enters PostReadCc Execute, which checks if BlockedBy == BlockByLock and calls LockHandleForResumedRequest to retry lock acquisition
- Lock release guarantee: PostReadCc Execute ensures the bucket lock is always released via ReleaseCceLock() regardless of the outcome
The design is sound: holding a read lock during blocking has minimal deadlock risk since read locks don't block other readers, and the lock is guaranteed to be released in PostReadCc. This matches how regular template-based reads handle the same scenario (template_cc_map.h lines 1489-1503).
tx_service/include/dead_lock_check.h (2)
202-205: LGTM with a note.The method declaration and documentation are clear. However, this is related to the earlier comment about the unused
tx_idparameter in the implementation.
251-254: LGTM!The member variable declarations and throttle interval constant are well-defined. The initialization of
last_throttled_check_time_to 0 means the first deadlock check request will always be allowed, which is reasonable behavior.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tx_service/include/cc/range_cc_map.h(3 hunks)tx_service/include/dead_lock_check.h(2 hunks)tx_service/src/dead_lock_check.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tx_service/src/dead_lock_check.cpp
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/include/cc/range_cc_map.h
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Applied to files:
tx_service/include/cc/range_cc_map.h
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/include/cc/range_cc_map.h
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.
Applied to files:
tx_service/include/cc/range_cc_map.h
🧬 Code graph analysis (2)
tx_service/include/cc/range_cc_map.h (1)
tx_service/src/dead_lock_check.cpp (2)
RequestCheckWithThrottle(56-79)RequestCheckWithThrottle(56-56)
tx_service/include/dead_lock_check.h (1)
tx_service/src/dead_lock_check.cpp (2)
RequestCheckWithThrottle(56-79)RequestCheckWithThrottle(56-56)
🔇 Additional comments (5)
tx_service/include/dead_lock_check.h (2)
202-205: LGTM!The throttled deadlock check method is well-documented and provides a clear mechanism to limit the frequency of deadlock checks.
251-254: Verify throttling behavior when deadlock check is skipped.The throttle implementation correctly limits checks to once per second. However, in
range_cc_map.hline 329, the return value ofRequestCheckWithThrottle()is not checked. When the throttle prevents a deadlock check (returns false), the request still blocks and waits. This means if multiple transactions hit lock contention within the same second, only the first triggers a deadlock check while others block indefinitely until the next scheduled check or lock release.Verify this is the intended behavior by checking if the periodic deadlock checker will eventually detect these blocked transactions, or if there are other mechanisms to wake them:
#!/bin/bash # Search for periodic deadlock check mechanism and blocked request wake-up logic rg -nP -C3 'time_interval_|requested_check_|con_var_\.notify' --type=cpptx_service/include/cc/range_cc_map.h (3)
32-32: LGTM!The include is necessary for the
RequestCheckWithThrottle()call added in the Execute method.
233-235: LGTM!The updated comment accurately reflects the behavioral change from aborting to blocking when lock acquisition fails, enabling deadlock detection.
328-346: Verify resumed request retry path is properly implemented in RangeCcMap.The design approach is sound—retaining the bucket read lock and retrying range lock acquisition on resumption is consistent with regular data reads and allows deadlock detection. However, RangeCcMap's Execute(ReadCc) override (lines 202–350) lacks the resume handling pattern found in the parent TemplateCcMap class. The parent class checks
req.CcePtr()to detect resumed requests and callsLockHandleForResumedRequest()when blocked by lock (template_cc_map.h:1489–1503), but RangeCcMap's implementation has no such check. This means resumed requests will attempt to re-acquire the bucket lock without verifying prior acquisition state. Confirm whether RangeCcMap intentionally bypasses this pattern or if the resume handling needs to be added to properly support resumed range read requests with retained bucket locks.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tx_service/include/cc/range_cc_map.h (1)
217-217:assert(req.IsLocal())conflicts with new remote‑acknowledgement branchAt line 217 you assert
req.IsLocal(), but at lines 338‑343 you added aif (!req.IsLocal())branch that casts toremote::RemoteReadand callsAcknowledge().In debug/test builds this makes the remote path effectively unreachable: any non‑local
ReadCcreaching this method would trip the assert before hitting the new acknowledgement logic. Please either:
- Relax/remove the
assert(req.IsLocal())if remoteReadCcrequests are expected here, or- Remove the remote‑specific branch and keep the invariant that
RangeCcMap::Execute(ReadCc)only handles local requests.Also applies to: 338-343
♻️ Duplicate comments (1)
tx_service/src/tx_operation.cpp (1)
4217-4255: Fault-injectedsplit_flush_commit_acquire_all_deadlockcan still stall the state machineEven after this change, the
WITH_FAULT_INJECTpath forsplit_flush_commit_acquire_all_deadlockrisks leavingcommit_acquire_all_write_op_(anAcquireAllOp) stuck and never reaching theop_ == &commit_acquire_all_write_op_branch inSplitFlushRangeOp::Forward, for two reasons:
Sub‑op never reaches a “logically finished” state:
commit_acquire_all_write_op_.Reset(all_node_groups->size())sets:
upload_cnt_ = node_cnt * keys_.size()(for range split this is#node_groups * 1).finish_cnt_ = fail_cnt_ = remote_ack_cnt_ = 0.- You then mark only
hd_results_[0]asDEAD_LOCK_ABORTviaSetError+ForceError.- Assuming
ForceErrortriggers thepost_lambda_, you getfinish_cnt_ = 1andfail_cnt_ = 1, but all otherhd_results_[i>0]remain unfinished andremote_ack_cnt_is still 0.- In
AcquireAllOp::Forward, this means:
remote_ack_cnt_ == 0, so the first timeout branch is skipped.finish_cnt_ != upload_cnt_, so the “all done” branch is never taken.- The final timeout branch iterates unfinished
hd_results_, but only acts whenac_res.node_term_ > 0 && !blocked_remote_cce_addr_.empty(), which is not true for the synthetic entries created byReset. Those entries are neverForceError’d, sofinish_cnt_never reachesupload_cnt_andtxm->PostProcess(*this)is never called.You still don’t run the sub‑op’s
Forwardwith real work:
- In the fault branch you do:
op_ = &commit_acquire_all_write_op_;txm->PushOperation(&commit_acquire_all_write_op_);commit_acquire_all_write_op_.is_running_ = true;- Unlike the normal path, you do not call
ForwardToSubOperation(txm, &commit_acquire_all_write_op_), so there is no guaranteedAcquireAllOp::Forward(txm)invocation tied to this transition.- Even if
Forwardis eventually called because the op is now on top of the stack,is_running_ == truepreventsAcquireAllOpfrom callingtxm->Process(*this)and sending any RPCs, and (per point 1) the synthetic state still never reaches the “all requests finished” branch.Net effect: with the
split_flush_commit_acquire_all_deadlockfault enabled, the composite split‑flush operation can hang rather than driving the deadlock downgrade path you’re trying to test — essentially the same failure mode highlighted in the earlier review.A safer pattern is to fully synthesize a “completed, deadlock’ed AcquireAllOp” before forwarding to it, so
AcquireAllOp::Forwardimmediately takes thefinish_cnt_ == upload_cnt_branch andPostProcesses without issuing RPCs. For example:-#ifdef WITH_FAULT_INJECT - // Check if fault injector is enabled - FaultEntry *fault_entry = - FaultInject::Entry("split_flush_commit_acquire_all_deadlock"); - if (fault_entry != nullptr) - { - // Only assign to op_ and push to txm, do not call Process - op_ = &commit_acquire_all_write_op_; - txm->PushOperation(&commit_acquire_all_write_op_); - auto all_node_groups = Sharder::Instance().AllNodeGroups(); - commit_acquire_all_write_op_.Reset(all_node_groups->size()); - commit_acquire_all_write_op_.is_running_ = true; - - // Ensure at least one result slot exists. - assert(commit_acquire_all_write_op_.hd_results_.size() >= 1); - // Set error on the first hd_result with DEAD_LOCK_ABORT - commit_acquire_all_write_op_.hd_results_[0].SetError( - CcErrorCode::DEAD_LOCK_ABORT); - commit_acquire_all_write_op_.hd_results_[0].ForceError(); - - DLOG(INFO) - << "FaultInject split_flush_commit_acquire_all_deadlock, " - << " commit_acquire_all_write_op_.IsDeadlock(): " - << commit_acquire_all_write_op_.IsDeadlock() - << " upload_cnt_: " - << commit_acquire_all_write_op_.upload_cnt_ - << " fail_cnt_: " - << commit_acquire_all_write_op_.fail_cnt_.load( - std::memory_order_relaxed); - // Auto-remove the fault to prevent it from being triggered - // again - FaultInject::Instance().InjectFault( - "split_flush_commit_acquire_all_deadlock", "remove"); - } - else - { - ForwardToSubOperation(txm, &commit_acquire_all_write_op_); - } -#else - ForwardToSubOperation(txm, &commit_acquire_all_write_op_); -#endif +#ifdef WITH_FAULT_INJECT + if (FaultInject::Entry( + "split_flush_commit_acquire_all_deadlock") != nullptr) + { + // Synthesize a completed DEAD_LOCK_ABORT without issuing RPCs. + auto &results = commit_acquire_all_write_op_.hd_results_; + if (results.empty()) + { + commit_acquire_all_write_op_.Resize(1); + } + + commit_acquire_all_write_op_.upload_cnt_ = 1; + commit_acquire_all_write_op_.finish_cnt_.store( + 1, std::memory_order_relaxed); + commit_acquire_all_write_op_.fail_cnt_.store( + 1, std::memory_order_relaxed); + commit_acquire_all_write_op_.remote_ack_cnt_.store( + 0, std::memory_order_relaxed); + commit_acquire_all_write_op_.is_running_ = true; + + auto &h = results[0]; + h.Reset(); + h.SetError(CcErrorCode::DEAD_LOCK_ABORT); + h.ForceError(); + + DLOG(INFO) << "FaultInject split_flush_commit_acquire_all_deadlock, " + << "IsDeadlock=" << commit_acquire_all_write_op_.IsDeadlock(); + + FaultInject::Instance().InjectFault( + "split_flush_commit_acquire_all_deadlock", "remove"); + } +#endif + // In both normal and fault-injected paths, drive the sub-op once + // so AcquireAllOp::Forward can PostProcess and let the parent see + // the deadlock state. + ForwardToSubOperation(txm, &commit_acquire_all_write_op_);This keeps
AcquireAllOp::Process()suppressed (viais_running_ = true), but guarantees:
upload_cnt_ == finish_cnt_ == 1,fail_cnt_ > 0,remote_ack_cnt_ == 0.AcquireAllOp::Forwardimmediately takes thefinish_cnt_ == upload_cnt_branch, runs the fail‑handling logic once, and callstxm->PostProcess(*this).- On the next
SplitFlushRangeOp::Forwardcall, theop_ == &commit_acquire_all_write_op_branch seesIsDeadlock() == trueand exercises the downgrade path instead of hanging.If you prefer not to rely on
AcquireAllOp::Forwardat all, an alternative is to skip pushing the sub‑op and simply setop_ = &commit_acquire_all_write_op_,fail_cnt_ = 1,upload_cnt_ = 1, and mark onehd_resultasDEAD_LOCK_ABORT, then handle the downgrade directly in the parent on the nextForwardcall — but the key point is that the current shape still leaves the sub‑op in a permanently “incomplete” state under the fault.Also applies to: 4532-4573
🧹 Nitpick comments (3)
tx_service/include/cc/range_cc_map.h (2)
233-235: Clarify which lock type “range reads now block” refers toThe comment at lines 233‑235 says range reads now block when lock acquisition fails, but the bucket lock acquisition path (lines 268‑290) still aborts via
AbortQueueRequest, while only the range lock path (lines 325‑347) actually blocks and participates in deadlock detection.Consider tightening the wording to something like “range record lock acquisition now blocks” or similar, or, if desired, extend the same blocking/deadlock‑check semantics to bucket lock conflicts as well.
Also applies to: 283-290, 325-347
267-283: Confirm bucket read‑lock cleanup for blocked‑but‑aborted range readsIn the blocked range‑lock path you now intentionally “keep the bucket read lock” (lines 332‑336) and return
falsewithreq.SetBlockType(ReadCc::BlockByLock). Since the bucket read lock is not tracked in the tx readset and is manually released only inExecute(PostReadCc)(lines 357‑368), any path that aborts/cancels a blockedReadCcbefore it reachesPostReadCcmust also ensure the bucket lock is released.If existing generic abort/cancellation logic already does this for blocked range reads, then behavior is fine; otherwise you may need either to:
- Track the bucket lock in the readset for these requests, or
- Add explicit cleanup in the abort path for blocked range reads.
Also applies to: 332-336, 350-369
tx_service/src/tx_operation.cpp (1)
309-327: ReadLocalOperation retry comment matches behavior; watch unbounded retriesThe new comment correctly describes the retry-on-error behavior for RW conflicts and data store errors. Note that this path will keep retrying indefinitely if
hd_result_keeps returningDATA_STORE_ERRorACQUIRE_KEY_LOCK_FAILED_FOR_RW_CONFLICT, since it doesn’t consultretry_num_orReRunOp. If that’s intentional (e.g., higher-level timeout/abort will eventually kick in), this is fine; otherwise, consider wiring this throughReRunOpwith a bounded retry count.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tx_service/include/cc/range_cc_map.h(3 hunks)tx_service/src/tx_operation.cpp(8 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/include/cc/range_cc_map.htx_service/src/tx_operation.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Applied to files:
tx_service/include/cc/range_cc_map.h
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/include/cc/range_cc_map.htx_service/src/tx_operation.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.
Applied to files:
tx_service/src/tx_operation.cpp
📚 Learning: 2025-09-25T06:15:41.186Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 142
File: include/tx_req_result.h:140-159
Timestamp: 2025-09-25T06:15:41.186Z
Learning: When cc_notify_ is true in TxResult, both Finish() and Wait() are guaranteed to be called on the same thread, eliminating concurrency concerns and making relaxed memory ordering sufficient for status_combo_ updates.
Applied to files:
tx_service/src/tx_operation.cpp
🧬 Code graph analysis (1)
tx_service/include/cc/range_cc_map.h (1)
tx_service/src/dead_lock_check.cpp (2)
RequestCheckWithThrottle(56-79)RequestCheckWithThrottle(56-56)
🔇 Additional comments (4)
tx_service/include/cc/range_cc_map.h (1)
32-32: Deadlock check integration looks fineIncluding
dead_lock_check.hand callingDeadLockCheck::RequestCheckWithThrottle()in the blocked-range path is consistent with the new throttled deadlock-check API. Ignoring the boolean return and relying only on the side effect is reasonable here.Also applies to: 328-330
tx_service/src/tx_operation.cpp (3)
49-49: Include reordering fortx_operation.his fineMoving this include down into the project header block is harmless and improves include hygiene; no behavioral impact.
601-623: LockWriteRangeBucketsOp RW-conflict comment aligns with retry behaviorThe updated comment accurately reflects that RW conflicts are now handled by retrying instead of preemptive abort, deferring to blocking + deadlock detection. The control flow (
Resetresult, flipexecute_immediately_, re-Process) is consistent with that design.
1846-1848: Additional AcquireAllOp completion DLOG is useful and low‑riskThe new debug log on “all acquire-all requests finished” (including
fail_cnt_) is inexpensive and should help diagnose lock-acquisition behavior without affecting semantics.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tx_service/src/tx_operation.cpp (1)
4218-4256: Fault injection forsplit_flush_commit_acquire_all_deadlockstill leavescommit_acquire_all_write_op_stuckThe new fault‑inject branch improves diagnostics but still doesn’t let
AcquireAllOpcomplete, soSplitFlushRangeOpcan stall indefinitely when this fault is enabled:
commit_acquire_all_write_op_.Reset(all_node_groups->size())setsupload_cnt_ = node_cnt * keys_.size()(≥ number of node groups), and zeroesfinish_cnt_/fail_cnt_.- You then set
is_running_ = true, push the sub‑op without callingProcess, and mark onlyhd_results_[0]asDEAD_LOCK_ABORTviaSetError+ForceError(). That incrementsfail_cnt_andfinish_cnt_to 1 via the per‑resultpost_lambda_.- With
is_running_ == true,AcquireAllOp::Forwardwill not calltxm->Process(*this)(no RPCs are issued). At the same time:
remote_ack_cnt_stays 0.finish_cnt_ (==1) != upload_cnt_, so thefinish_cnt_ == upload_cnt_branch (which callstxm->PostProcess(*this)) is never taken.- The timeout/deadlock path at the end of
AcquireAllOp::Forwardinspectsac_res.node_term_ > 0 && has_blocked_remote_cce, butReset()leavesnode_term_ = -1, so it never forces additional completions or callsPostProcesseither.- As a result,
commit_acquire_all_write_op_never reaches a “finished” state in which it would calltxm->PostProcess(*this), and the parentSplitFlushRangeOpnever re‑enters itsop_ == &commit_acquire_all_write_op_branch to seefail_cnt_ > 0andIsDeadlock() == true. Any test that enables this fault will hang rather than exercising the downgrade path.To keep the fault semantics (no real RPCs) but guarantee progress, you need to either:
- Do not push the sub‑op at all; just synthesize a failed, deadlock’ed sub‑op state and let the parent handle it on the next
Forward, e.g.:#ifdef WITH_FAULT_INJECT - FaultEntry *fault_entry = - FaultInject::Entry("split_flush_commit_acquire_all_deadlock"); - if (fault_entry != nullptr) - { - // Only assign to op_ and push to txm, do not call Process - op_ = &commit_acquire_all_write_op_; - txm->PushOperation(&commit_acquire_all_write_op_); - auto all_node_groups = Sharder::Instance().AllNodeGroups(); - commit_acquire_all_write_op_.Reset(all_node_groups->size()); - commit_acquire_all_write_op_.is_running_ = true; - - // Ensure at least one result slot exists. - assert(commit_acquire_all_write_op_.hd_results_.size() >= 1); - // Set error on the first hd_result with DEAD_LOCK_ABORT - commit_acquire_all_write_op_.hd_results_[0].SetError( - CcErrorCode::DEAD_LOCK_ABORT); - commit_acquire_all_write_op_.hd_results_[0].ForceError(); + if (FaultInject::Entry( + "split_flush_commit_acquire_all_deadlock") != nullptr) + { + // Synthesize a completed, deadlock'ed acquire-all without issuing RPCs. + op_ = &commit_acquire_all_write_op_; + + commit_acquire_all_write_op_.upload_cnt_ = 1; + commit_acquire_all_write_op_.finish_cnt_.store( + 1, std::memory_order_relaxed); + commit_acquire_all_write_op_.fail_cnt_.store( + 1, std::memory_order_relaxed); + commit_acquire_all_write_op_.remote_ack_cnt_.store( + 0, std::memory_order_relaxed); + + auto &res = commit_acquire_all_write_op_.hd_results_[0]; + res.Reset(); + res.SetError(CcErrorCode::DEAD_LOCK_ABORT); + res.ForceError(); // make IsError() true for IsDeadlock()
- Or keep pushing the sub‑op but also synthesize a fully completed state so that
AcquireAllOp::Forwardwill immediately observefinish_cnt_ == upload_cnt_and calltxm->PostProcess(*this)on the next tick (as suggested in the prior review).Either way, the key is: after the fault fires,
commit_acquire_all_write_op_must transition to a finished, deadlock’ed state that lets the parent branch run once more and hit theIsDeadlock()downgrade path, instead of remaining perpetually incomplete.This is the same underlying issue that was flagged in the earlier review; the current changes haven’t eliminated the risk of the state machine getting stuck under this fault.
Also applies to: 4258-4258
🧹 Nitpick comments (4)
tx_service/include/cc/range_cc_map.h (1)
328-347: Blocking path and deadlock check integration look good; clarify remote/local and lock-lifetime invariantsSwitching the range-lock conflict path to:
- trigger a throttled deadlock check,
- keep the bucket read lock,
- mark the request as
BlockByLock, and- return
falseto let the scheduler re-run itis consistent with typical “block instead of abort” CC behavior and should make deadlock detection see real wait edges.
Two things are worth tightening/confirming:
- The
if (!req.IsLocal())branch sits in a function that begins withassert(req.IsLocal());, so in debug builds thatAcknowledge()path appears unreachable. If remoteReadCcinstances are ever routed through thisExecute, the assert will fire instead of acknowledging; if they are truly never routed here, the remote-specific code is dead and could be removed to avoid confusion.- Because the bucket read lock is deliberately kept while the request is blocked and is not tracked in the readset, it’s important that every eventual completion/abort path for a blocked range read goes through
PostReadCc(or an equivalent cleanup path) so bucket locks can’t leak. The pattern looks intentional and mirrors regular data reads, but it’s worth double-checking that the resume + post-read flow for blocked range reads follows the same conventions.tx_service/src/tx_operation.cpp (3)
321-328: Range-local read now retries on RW conflict instead of aborting – confirm scheduling semanticsThe new retry path for
ACQUIRE_KEY_LOCK_FAILED_FOR_RW_CONFLICT/DATA_STORE_ERRclearshd_result_, marksexecute_immediately_ = false, and immediately callstxm->Process(*this)without any backoff orReRunOpuse. That matches the “blocking range reads + deadlock detection” intent, but relies on the surrounding scheduler to avoid a tight retry loop.Please double‑check that:
execute_immediately_ = falseactually re-queues this op onto the blocked/waiting path rather than spinning on the TxProcessor thread; and- the DATA_STORE_ERR case is intended to be retriable under this new behavior (vs surfacing an error after bounded retries).
If both are intentional and covered by tests, then this looks good.
615-623: LockWriteRangeBucketsOp now retries on RW conflict — ensure it blocks instead of spinningSimilar to
ReadLocalOperation, a bucket-lock error with codeACQUIRE_KEY_LOCK_FAILED_FOR_RW_CONFLICTnow resetslock_result_, setsis_running_ = false/execute_immediately_ = false, and re-invokestxm->Process(*this).This seems correct for “blocking range read” semantics, but it assumes:
execute_immediately_ = falsecauses the operation to be parked until notified (e.g., via deadlock check and unblocking), and- no fast spin occurs on the TxProcessor thread while the conflicting writer still holds the lock.
If your concurrency tests confirm this behavior, no change is needed; otherwise, consider wiring this through
ReRunOp(with backoff) for consistency with other retryable ops.
1847-1848: New AcquireAllOp DLOG is fine; consider explicit memory order for consistencyThe extra
DLOG(INFO)when all acquire-all requests finish is useful diagnostics and is in a non‑hot path (debug logging only), so it’s safe.For consistency with the rest of
AcquireAllOp(which usesload(std::memory_order_relaxed)everywhere), you might want to specify the memory order here as well:- DLOG(INFO) << "All acquire all requests finished for tx " - << txm->TxNumber() << ", fail count: " << fail_cnt_.load(); + DLOG(INFO) << "All acquire all requests finished for tx " + << txm->TxNumber() + << ", fail count: " + << fail_cnt_.load(std::memory_order_relaxed);Purely stylistic; behavior is already correct.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tx_service/include/cc/range_cc_map.h(3 hunks)tx_service/include/dead_lock_check.h(2 hunks)tx_service/src/tx_operation.cpp(7 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/include/cc/range_cc_map.htx_service/src/tx_operation.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Applied to files:
tx_service/include/cc/range_cc_map.h
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/include/cc/range_cc_map.htx_service/src/tx_operation.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.
Applied to files:
tx_service/src/tx_operation.cpp
📚 Learning: 2025-09-25T06:15:41.186Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 142
File: include/tx_req_result.h:140-159
Timestamp: 2025-09-25T06:15:41.186Z
Learning: When cc_notify_ is true in TxResult, both Finish() and Wait() are guaranteed to be called on the same thread, eliminating concurrency concerns and making relaxed memory ordering sufficient for status_combo_ updates.
Applied to files:
tx_service/src/tx_operation.cpp
🧬 Code graph analysis (2)
tx_service/include/dead_lock_check.h (1)
tx_service/src/dead_lock_check.cpp (2)
RequestCheckWithThrottle(56-79)RequestCheckWithThrottle(56-56)
tx_service/include/cc/range_cc_map.h (1)
tx_service/src/dead_lock_check.cpp (2)
RequestCheckWithThrottle(56-79)RequestCheckWithThrottle(56-56)
🔇 Additional comments (6)
tx_service/include/dead_lock_check.h (2)
202-205: Throttle API shape and contract look consistent with implementationThe
RequestCheckWithThrottle()declaration and its documented behavior match the cpp implementation (singleton guard, mutex, 1s window, boolean return). No issues from the header side.
251-256: Throttle state and interval wiring look correctStoring
last_throttled_check_time_perDeadLockCheckinstance and usingCHECK_THROTTLE_INTERVAL_ = 1 * MICRO_SECONDcorrectly encode a 1-second global throttle for the singleton; this aligns with the comments and the cpp usage.tx_service/include/cc/range_cc_map.h (2)
32-32: Includingdead_lock_check.hhere is appropriateThe new include is needed for
DeadLockCheck::RequestCheckWithThrottle()and fits the existing include style in this header.
233-235: Behavioral comment matches the new blocking semanticsThe added comment accurately reflects that range reads can now block on lock acquisition and later resume, which is an important invariant shift from the previous “always abort on conflict” behavior.
tx_service/src/tx_operation.cpp (2)
4535-4537: Additional diagnostics around commit-acquire-all failures look goodThe new
DLOG(INFO)forcommit_acquire_all_write_op_.fail_cnt_and theDLOG(ERROR)whenIsDeadlock()is true provide useful context in split‑flush range deadlock scenarios, with negligible overhead in non‑debug builds.No functional issues from these logging additions.
Also applies to: 4548-4548
4569-4575: Newterm_SplitFlushOp_CommitAcquireAllWriteOp_Continuefault hook is consistent with existing patternsThe
CODE_FAULT_INJECTOR("term_SplitFlushOp_CommitAcquireAllWriteOp_Continue", { ...; return; });hook is in line with other*_Continueinjection points in this file (e.g., the install‑dirty and data‑sync phases). It cleanly short‑circuits the branch while the fault is enabled and relies on test code to remove the entry when it’s time to resume.No correctness concerns here; just ensure test harnesses always remove this fault to avoid persistent stalls.
| // bucket lock and can immediately retry the range lock acquisition. | ||
| // If the read request comes from a remote node, send | ||
| // acknowledgement | ||
| if (!req.IsLocal()) |
There was a problem hiding this comment.
this should never happen. range cc map ReadCc should only be initiated by local node
| return false; | ||
| } | ||
|
|
||
| std::unique_lock<std::mutex> lk(inst_->mutex_); |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tx_service/src/tx_operation.cpp (1)
4218-4256: Fault injection forsplit_flush_commit_acquire_all_deadlockstill risks leavingcommit_acquire_all_write_op_never-finishedThis new fault path sets
op_, pushescommit_acquire_all_write_op_, callsReset(...), marksis_running_ = true, and only manipulateshd_results_[0]. It does not synthesize a fully “finished” state forAcquireAllOp:
upload_cnt_remainsnode_cnt * keys_.size()(≥ 1, typically > 1).finish_cnt_andfail_cnt_are left at 0 (or whatever previous values were).- No
ForceError()-driven post-lambda runs for indices[1, upload_cnt_), sofinish_cnt_ != upload_cnt_.remote_ack_cnt_stays 0.As a result,
AcquireAllOp::Forwardnever hits thefinish_cnt_ == upload_cnt_branch that callstxm->PostProcess(*this), and the compositeSplitFlushRangeOpcan remain stuck withop_ == &commit_acquire_all_write_op_for tests that enable this fault. This is essentially the same liveness issue called out in the earlier review.You can preserve the “skip real Process()/RPCs” intent while guaranteeing progress by fully synthesizing a one-slot, failed, deadlock’ed state:
#ifdef WITH_FAULT_INJECT - // Check if fault injector is enabled - FaultEntry *fault_entry = - FaultInject::Entry("split_flush_commit_acquire_all_deadlock"); - if (fault_entry != nullptr) - { - // Only assign to op_ and push to txm, do not call Process - op_ = &commit_acquire_all_write_op_; - txm->PushOperation(&commit_acquire_all_write_op_); - auto all_node_groups = Sharder::Instance().AllNodeGroups(); - commit_acquire_all_write_op_.Reset(all_node_groups->size()); - commit_acquire_all_write_op_.is_running_ = true; - - // Ensure at least one result slot exists. - assert(commit_acquire_all_write_op_.hd_results_.size() >= 1); - auto &hd_result0 = commit_acquire_all_write_op_.hd_results_[0]; - hd_result0.Reset(); - hd_result0.SetToBlock(); - // Set error on the first hd_result with DEAD_LOCK_ABORT - hd_result0.SetError(CcErrorCode::DEAD_LOCK_ABORT); - DLOG(INFO) - << "FaultInject split_flush_commit_acquire_all_deadlock, " - << " commit_acquire_all_write_op_.IsDeadlock(): " - << commit_acquire_all_write_op_.IsDeadlock() - << " upload_cnt_: " - << commit_acquire_all_write_op_.upload_cnt_ - << " fail_cnt_: " - << commit_acquire_all_write_op_.fail_cnt_.load( - std::memory_order_relaxed); + FaultEntry *fault_entry = + FaultInject::Entry("split_flush_commit_acquire_all_deadlock"); + if (fault_entry != nullptr) + { + // Put the sub-op on the stack but skip real RPCs; synthesize a + // completed DEAD_LOCK_ABORT so AcquireAllOp::Forward will + // immediately finish and be seen as deadlocked. + op_ = &commit_acquire_all_write_op_; + txm->PushOperation(&commit_acquire_all_write_op_); + commit_acquire_all_write_op_.is_running_ = true; + + if (commit_acquire_all_write_op_.hd_results_.empty()) + { + commit_acquire_all_write_op_.Resize(1); + } + auto &hd_result0 = commit_acquire_all_write_op_.hd_results_[0]; + hd_result0.Reset(); + hd_result0.SetError(CcErrorCode::DEAD_LOCK_ABORT); + hd_result0.ForceError(); // runs post_lambda_, bumps counters + + commit_acquire_all_write_op_.upload_cnt_ = 1; + commit_acquire_all_write_op_.finish_cnt_.store( + 1, std::memory_order_relaxed); + commit_acquire_all_write_op_.fail_cnt_.store( + 1, std::memory_order_relaxed); + commit_acquire_all_write_op_.remote_ack_cnt_.store( + 0, std::memory_order_relaxed); + + DLOG(INFO) + << "FaultInject split_flush_commit_acquire_all_deadlock, " + << " commit_acquire_all_write_op_.IsDeadlock(): " + << commit_acquire_all_write_op_.IsDeadlock() + << " upload_cnt_: " + << commit_acquire_all_write_op_.upload_cnt_ + << " fail_cnt_: " + << commit_acquire_all_write_op_.fail_cnt_.load( + std::memory_order_relaxed); // Auto-remove the fault to prevent it from being triggered // again FaultInject::Instance().InjectFault( "split_flush_commit_acquire_all_deadlock", "remove"); } else { ForwardToSubOperation(txm, &commit_acquire_all_write_op_); } #else ForwardToSubOperation(txm, &commit_acquire_all_write_op_); #endifThis keeps
AcquireAllOp::Process()from issuing real locks but guaranteesfinish_cnt_ == upload_cnt_,fail_cnt_ > 0, andIsDeadlock() == true, so the parent reaches the downgrade path instead of hanging.
🧹 Nitpick comments (3)
tx_service/src/tx_operation.cpp (3)
318-329: Range-read local RW-conflict now always retries; consider bounding or differentiating non-retriable errorsThe new behavior for
ReadLocalOperation::ForwardonACQUIRE_KEY_LOCK_FAILED_FOR_RW_CONFLICTandDATA_STORE_ERRis to unconditionally reset and re-run the op. With blocking range reads this makes sense for true lock conflicts, but for persistentDATA_STORE_ERRthis can create an effectively infinite retry loop with no backoff or fail-out.Consider:
- Treating
DATA_STORE_ERRas non-retriable (propagate error instead of retry), or- Adding a bounded retry counter / backoff here, similar to other ops using
ReRunOp, so a permanently failing KV or CC layer doesn’t leave the tx stuck forever.
613-623: Bucket-range RW-conflict also unboundedly retries; align with desired failure semantics
LockWriteRangeBucketsOp::Forwardnow handlesACQUIRE_KEY_LOCK_FAILED_FOR_RW_CONFLICTby resetting and re-processing the op without ever surfacing an abort. That matches the new “blocking + deadlock detection” design, but again there is no retry bound or differentiation between transient lock contention and structural failures.If
ACQUIRE_KEY_LOCK_FAILED_FOR_RW_CONFLICTcan be returned for reasons that will not be resolved by deadlock detection (e.g., misconfiguration, protocol mismatch), this op will also spin indefinitely.I’d recommend:
- Confirming that this error code is now only used for genuine lock conflicts that must be blocked, and/or
- Introducing a small bounded retry / timeout path analogous to other operations to preserve liveness on misbehaving nodes.
4535-4575: New logging around commit-phase AcquireAll and fault injector behavior
- The extra
DLOG(INFO)andDLOG(ERROR)lines aroundcommit_acquire_all_write_op_failure and deadlock handling are reasonable and help when debugging range-split issues.- The new
CODE_FAULT_INJECTOR("term_SplitFlushOp_CommitAcquireAllWriteOp_Continue", ...)intentionally early-returns under fault to simulate a stalled commit; that’s consistent with otherterm_*_Continueinjectors in this file, but it will leave the op parked until the fault is cleared.Given how invasive this state-machine is, it’s worth double-checking that all these fault names are documented in your tests/tooling and that they’re always cleaned up (either via auto-remove, like above, or explicit “remove” calls) to avoid accidentally wedging a test environment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tx_service/include/cc/range_cc_map.h(3 hunks)tx_service/include/dead_lock_check.h(1 hunks)tx_service/src/tx_operation.cpp(7 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/include/cc/range_cc_map.htx_service/src/tx_operation.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Applied to files:
tx_service/include/cc/range_cc_map.h
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/include/cc/range_cc_map.htx_service/src/tx_operation.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.
Applied to files:
tx_service/src/tx_operation.cpp
📚 Learning: 2025-09-25T06:15:41.186Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 142
File: include/tx_req_result.h:140-159
Timestamp: 2025-09-25T06:15:41.186Z
Learning: When cc_notify_ is true in TxResult, both Finish() and Wait() are guaranteed to be called on the same thread, eliminating concurrency concerns and making relaxed memory ordering sufficient for status_combo_ updates.
Applied to files:
tx_service/src/tx_operation.cpp
🧬 Code graph analysis (1)
tx_service/include/cc/range_cc_map.h (1)
tx_service/include/dead_lock_check.h (1)
RequestCheck(192-200)
🔇 Additional comments (4)
tx_service/include/dead_lock_check.h (1)
246-251: Throttle metadata fits existing time modelThe new
last_throttled_check_time_andCHECK_THROTTLE_INTERVAL_fields align with the existing microsecond-based timing in this class and with the singleton usage ofDeadLockCheck. From the header alone there’s no correctness concern; the actual throttling policy will hinge on how these are used inRun()/GatherLockDependancy().tx_service/include/cc/range_cc_map.h (2)
32-32: Includingdead_lock_check.hhere is appropriateWiring
RangeCcMapdirectly toDeadLockCheckvia the header makes sense given the new blocking behavior inReadCc; no layering or dependency issues are apparent from this file.
233-333: New blocking behavior on range-key lock looks consistent; verify deadlock/cleanup assumptionsThe shift from eager abort to blocking on
ACQUIRE_LOCK_BLOCKEDfor the range entry, plus callingDeadLockCheck::RequestCheck()and markingReadCcasBlockByLock, is consistent with how regular data reads normally participate in deadlock detection and should let the detector see these waits.Two things are worth explicitly verifying:
DeadLockCheck availability
DeadLockCheck::RequestCheck()is a no-op ifinst_has not been initialized. With the new blocking behavior, range reads rely on deadlock detection to resolve genuine cycles. Please confirm thatDeadLockCheck::Init(...)is called in all deployments/tests before anyRangeCcMap::Execute(ReadCc&)can run, or consider a fallback (e.g., logging + abort) ifinst_is null so we don’t risk silent permanent blocking in configurations where deadlock checking is disabled.Bucket read lock release on deadlock-victim aborts
This path acquires the bucket read lock first and keeps it while the range lock is blocked. Since the bucket lock is not tracked in the transaction readset and is instead manually released inExecute(PostReadCc&)(and cleaned up in the destructor for drop-table), please double-check that the transaction-abort path used when the deadlock detector kills a victim will also eventually release this bucket lock (either by drivingPostReadCcor an equivalent cleanup). Otherwise, a deadlock-induced abort could leave the bucket read lock pinned longer than intended.Functionally, the change looks sound and aligns range reads with the broader deadlock-detection strategy; the above items are about ensuring progress and avoiding subtle lock leaks under deadlock-handling scenarios.
tx_service/src/tx_operation.cpp (1)
1847-1848: Additional AcquireAll completion DLOG is fineThe added
DLOG(INFO)on “all acquire all requests finished” is low-impact and useful for debugging; it’s gated behindDLOGso it won’t spam production logs.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tx_service/src/dead_lock_check.cpp (1)
558-565: Consider simplifying redundant condition checks.Lines 558-565 re-check the same conditions already verified by the
wait_forpredicate at lines 535-541. While these checks handle the timeout case correctly, they duplicate logic and reduce readability.Consider simplifying to:
- if (!requested_check_.load(std::memory_order_acquire)) - { - continue; - } - else if (LocalCcShards::ClockTs() - last_check_time_ < time_interval_) - { - continue; - } - - // Reset the check flag brefore gather lock dependancy requested_check_.store(false, std::memory_order_release); lk.unlock(); GatherLockDependancy();The
wait_forpredicate already ensures both conditions are met when returning true (non-timeout), so these redundant checks can be removed. The defensive programming doesn't add safety here since the mutex protects access to these variables.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tx_service/include/dead_lock_check.h(2 hunks)tx_service/src/dead_lock_check.cpp(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/src/dead_lock_check.cpp
🔇 Additional comments (2)
tx_service/include/dead_lock_check.h (1)
192-199: LGTM! Proper atomic signaling pattern.The atomic store with
memory_order_releasecorrectly publishes the check request to the waiting thread, which usesmemory_order_acquireto observe it.tx_service/src/dead_lock_check.cpp (1)
525-545: LGTM! Correct atomic wait condition with throttling.The
wait_forlambda properly usesmemory_order_acquireto observe the flag set byRequestCheck()and correctly throttles check requests by requiringtime_interval_to elapse since the last check.
Here are some reminders before you submit the pull request
fixes eloqdb/tx_service#issue_id./mtr --suite=mono_main,mono_multi,mono_basicSummary by CodeRabbit
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.