Fix remote scanslice request#466
Conversation
WalkthroughThe changes add null-safety checks for range initialization, implement conditional CCM initialization with retry logic during split-range operations, introduce a fault-injection hook to skip auto-split-range data sync tasks, and modify scan-slice task dispatch to compute destination cores dynamically based on range ID instead of pinning to core 0. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tx_service/src/cc/local_cc_shards.cpp (1)
2955-2955: Use non-debug logging for this injected skip path.At Line 2955,
DLOG(INFO)may be invisible in non-debug builds. Prefer a regularLOG(WARNING)(with table/range/term context) so fault-triggered behavior is diagnosable during staging/chaos runs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tx_service/src/cc/local_cc_shards.cpp` at line 2955, Replace the debug-only DLOG(INFO) call that emits "FaultInject term_skip_auto_split_range" with a non-debug LOG(WARNING) and include contextual identifiers (e.g., table id, range id, term number) so the injected skip is visible in staging/chaos runs; locate the DLOG(INFO) call in local_cc_shards.cpp that emits "FaultInject term_skip_auto_split_range" and change it to LOG(WARNING) with a descriptive message containing the table/range/term variables available in that scope.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tx_service/src/cc/local_cc_shards.cpp`:
- Around line 2954-2957: The fault-injection branch in
CODE_FAULT_INJECTOR("term_skip_auto_split_range") returns early and prevents
scheduling the split data-sync task, which can permanently stall an auto-split
because need_split is latched by
TemplateCcMap::UpdateRangeSize(uint32_t,int32_t,bool) and only cleared via
ResetRangeStatus(partition_id). Modify the fault path so it does not permanently
suppress retries: instead of returning immediately, either (a) re-arm/clear the
latch by calling ResetRangeStatus(partition_id) or (b) enqueue a
deferred/rescheduled split task (same code path used for normal retries) before
returning; ensure the change references term_skip_auto_split_range and preserves
the original scheduling semantics used by the split data-sync task so future
retries still occur after the fault is disabled.
---
Nitpick comments:
In `@tx_service/src/cc/local_cc_shards.cpp`:
- Line 2955: Replace the debug-only DLOG(INFO) call that emits "FaultInject
term_skip_auto_split_range" with a non-debug LOG(WARNING) and include contextual
identifiers (e.g., table id, range id, term number) so the injected skip is
visible in staging/chaos runs; locate the DLOG(INFO) call in local_cc_shards.cpp
that emits "FaultInject term_skip_auto_split_range" and change it to
LOG(WARNING) with a descriptive message containing the table/range/term
variables available in that scope.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 94ebff56-1b9c-4840-84fd-edf82f2bed69
📒 Files selected for processing (3)
tx_service/include/cc/range_cc_map.htx_service/src/cc/local_cc_shards.cpptx_service/src/remote/cc_stream_receiver.cpp
Send remote scan slice request to correct core.
Summary by CodeRabbit
Bug Fixes
Quality Improvements