Skip to content

Fix remote scanslice request#466

Merged
yangsw26 merged 1 commit into
mainfrom
fix_remotescanslice
Mar 19, 2026
Merged

Fix remote scanslice request#466
yangsw26 merged 1 commit into
mainfrom
fix_remotescanslice

Conversation

@yangsw26

@yangsw26 yangsw26 commented Mar 19, 2026

Copy link
Copy Markdown
Collaborator

Send remote scan slice request to correct core.

Summary by CodeRabbit

  • Bug Fixes

    • Improved null-pointer handling during range initialization to prevent crashes.
    • Enhanced error handling for initialization failures with proper logging and graceful abort mechanisms.
  • Quality Improvements

    • Added fault-injection capabilities for testing range-split scenarios.
    • Optimized task distribution logic for improved load balancing across system cores.

@yangsw26 yangsw26 self-assigned this Mar 19, 2026
@coderabbitai

coderabbitai Bot commented Mar 19, 2026

Copy link
Copy Markdown

Walkthrough

The changes add null-safety checks for range initialization, implement conditional CCM initialization with retry logic during split-range operations, introduce a fault-injection hook to skip auto-split-range data sync tasks, and modify scan-slice task dispatch to compute destination cores dynamically based on range ID instead of pinning to core 0.

Changes

Cohort / File(s) Summary
Range CCMap Null Handling
tx_service/include/cc/range_cc_map.h
Guard against TypedStoreRange() returning nullptr by storing pointer and conditionally calling PostCkptSize(); add CCM initialization with retry logic on split-range operations, handling null returns and initialization failures with error logging and request abortion.
Fault Injection for Split Range Data Sync
tx_service/src/cc/local_cc_shards.cpp
Add fault-injection hook at CreateSplitRangeDataSyncTask entry point for term_skip_auto_split_range fault name, enabling early return to prevent metadata lock acquisition, DataSyncStatus creation, and task enqueuing.
Scan-Slice Core Dispatch Logic
tx_service/src/remote/cc_stream_receiver.cpp
Replace static core 0 pinning with dynamic destination-core computation using (range_id & 0x3FF) % local_shards_.Count() formula for scan-slice task enqueue operations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • liunyl
  • githubzilla

Poem

🐰 A rabbit checks each pointer's fate,
Null-guards the ranges, keeps them straight!
When faults inject, we hop, skip, flee—
And scatter tasks to cores with glee! 🌰

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is minimal and lacks required sections from the template such as tests, documentation, issue references, and test suite confirmation. Expand the description to include test coverage, documentation references, issue links, and confirmation of passing the specified test suites per the template.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Fix remote scanslice request' is partially related to the changeset but lacks specificity; it describes the problem area rather than the specific solution implemented (routing to correct core). Consider a more specific title like 'Route remote scan slice requests to correct core based on range ID' to better communicate the actual implementation change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix_remotescanslice
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tx_service/src/cc/local_cc_shards.cpp (1)

2955-2955: Use non-debug logging for this injected skip path.

At Line 2955, DLOG(INFO) may be invisible in non-debug builds. Prefer a regular LOG(WARNING) (with table/range/term context) so fault-triggered behavior is diagnosable during staging/chaos runs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/src/cc/local_cc_shards.cpp` at line 2955, Replace the debug-only
DLOG(INFO) call that emits "FaultInject term_skip_auto_split_range" with a
non-debug LOG(WARNING) and include contextual identifiers (e.g., table id, range
id, term number) so the injected skip is visible in staging/chaos runs; locate
the DLOG(INFO) call in local_cc_shards.cpp that emits "FaultInject
term_skip_auto_split_range" and change it to LOG(WARNING) with a descriptive
message containing the table/range/term variables available in that scope.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tx_service/src/cc/local_cc_shards.cpp`:
- Around line 2954-2957: The fault-injection branch in
CODE_FAULT_INJECTOR("term_skip_auto_split_range") returns early and prevents
scheduling the split data-sync task, which can permanently stall an auto-split
because need_split is latched by
TemplateCcMap::UpdateRangeSize(uint32_t,int32_t,bool) and only cleared via
ResetRangeStatus(partition_id). Modify the fault path so it does not permanently
suppress retries: instead of returning immediately, either (a) re-arm/clear the
latch by calling ResetRangeStatus(partition_id) or (b) enqueue a
deferred/rescheduled split task (same code path used for normal retries) before
returning; ensure the change references term_skip_auto_split_range and preserves
the original scheduling semantics used by the split data-sync task so future
retries still occur after the fault is disabled.

---

Nitpick comments:
In `@tx_service/src/cc/local_cc_shards.cpp`:
- Line 2955: Replace the debug-only DLOG(INFO) call that emits "FaultInject
term_skip_auto_split_range" with a non-debug LOG(WARNING) and include contextual
identifiers (e.g., table id, range id, term number) so the injected skip is
visible in staging/chaos runs; locate the DLOG(INFO) call in local_cc_shards.cpp
that emits "FaultInject term_skip_auto_split_range" and change it to
LOG(WARNING) with a descriptive message containing the table/range/term
variables available in that scope.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 94ebff56-1b9c-4840-84fd-edf82f2bed69

📥 Commits

Reviewing files that changed from the base of the PR and between 1a4729b and aed57c6.

📒 Files selected for processing (3)
  • tx_service/include/cc/range_cc_map.h
  • tx_service/src/cc/local_cc_shards.cpp
  • tx_service/src/remote/cc_stream_receiver.cpp

Comment thread tx_service/src/cc/local_cc_shards.cpp
@yangsw26 yangsw26 merged commit 195439f into main Mar 19, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants