Skip to content

read range info parallelly and batch read catalog#418

Merged
lokax merged 12 commits into
mainfrom
yf-catalog-slow
Feb 28, 2026
Merged

read range info parallelly and batch read catalog#418
lokax merged 12 commits into
mainfrom
yf-catalog-slow

Conversation

@lokax

@lokax lokax commented Feb 26, 2026

Copy link
Copy Markdown
Collaborator

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

  • New Features

    • Added support for catalog batch read operations.
  • Improvements

    • Enhanced multi-partition table scanning with parallel execution across partitions.
    • Improved error handling and state management for partition-based operations.

Copilot AI review requested due to automatic review settings February 26, 2026 06:11
@coderabbitai

coderabbitai Bot commented Feb 26, 2026

Copy link
Copy Markdown

Walkthrough

This pull request extends FetchTableRanges from single-partition to multi-partition scanning with per-partition state management and parallel ScanNext invocations. It also introduces catalog batch read processing as a distinct code path in transaction execution with special handling for catalog metadata reads.

Changes

Cohort / File(s) Summary
Multi-Partition Scan State & Initialization
store_handler/data_store_service_client.cpp, tx_service/include/cc/cc_req_misc.h
Refactors scan state from single-partition to per-partition model. Introduces PartitionScanState struct to encapsulate per-partition keys and session IDs. Adds partition_scan_states_ vector, remaining_partitions_ counter, and finish_mux_ lock for synchronization. Computes total partitions and initializes per-partition containers.
Multi-Partition Scan Coordination & Callbacks
store_handler/data_store_service_client_closure.cpp
Implements thread-safe error aggregation and per-partition coordination. Replaces single-scan callback with per-partition ScanNext invocations. Adds mutex-guarded remaining_partitions_ decrement and synthetic default range insertion when all partitions complete. Uses kv_part_id to append results via AppendTableRanges per partition.
Catalog Batch Read Processing
tx_service/include/tx_request.h, tx_service/src/tx_execution.cpp
Adds is_catalog_batch_ flag to BatchReadTxRequest to enable catalog-specific batch read path. Introduces branching logic in Process() and PostProcess() to handle catalog batch reads separately, reading locally against catalog_ccm_name instead of standard table batch flow.
Formatting
tx_service/include/tx_util.h, tx_service/src/cc/cc_req_misc.cpp
Minor whitespace adjustments; no functional changes.

Sequence Diagram

sequenceDiagram
    participant Client
    participant FetchTableRanges
    participant KVPartition1 as KV Partition 1
    participant KVPartition2 as KV Partition 2
    participant Callback as Aggregation Callback

    Client->>FetchTableRanges: Initiate FetchTableRanges
    FetchTableRanges->>FetchTableRanges: Compute total_partitions<br/>Init partition_scan_states_
    FetchTableRanges->>FetchTableRanges: Set remaining_partitions_

    par Parallel Partition Scans
        FetchTableRanges->>KVPartition1: ScanNext(partition_id=0, keys, session)
        FetchTableRanges->>KVPartition2: ScanNext(partition_id=1, keys, session)
    end

    KVPartition1->>Callback: Return ranges for partition 0
    Callback->>Callback: Acquire finish_mux_<br/>Decrement remaining_partitions_
    Callback->>Callback: AppendTableRanges(0, ranges)

    KVPartition2->>Callback: Return ranges for partition 1
    Callback->>Callback: Acquire finish_mux_<br/>Decrement remaining_partitions_<br/>Check remaining == 0
    Callback->>Callback: Merge results<br/>Handle synthetic ranges
    Callback->>Client: SetFinish(merged_result)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • Refactor scan next batch for hash partition #131: Implements parallel multi-partition/bucket batched scan fetches with per-bucket state vector and coordinated scan logic, aligning with the partition-aware scanning pattern introduced here.
  • optimze update range slices #254: Modifies FetchTableRanges and callback state to be partition-aware with per-partition scan states and AppendTableRanges signature changes matching the architectural shift in this PR.

Suggested reviewers

  • liunyl
  • thweetkomputer

🐰 Hop, hop, hooray! The ranges now scan in parallel splays,

Per-partition states dance through the maze,

With synchrony locks and catalogs bright,

FetchTableRanges hops to new heights! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description only contains an unchecked template checklist with no actual implementation details, rationale, or substantive content provided by the author. Provide a meaningful description explaining the changes, motivation, and testing approach. Check off completed items and clearly describe what was implemented.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'read range info parallelly and batch read catalog' is directly related to the main changes: parallelizing range info retrieval across KV partitions and adding catalog batch read support.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yf-catalog-slow

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tx_service/include/tx_request.h (1)

1346-1364: ⚠️ Potential issue | 🟠 Major

Reset is_catalog_batch_ in Set(...) to prevent stale request mode.

At Line 1363 you reset local_cache_checked_, but is_catalog_batch_ is never reset/updated in Set(...). If this request object is reused, a previous catalog batch value can leak into later non-catalog reads.

💡 Proposed fix
 void Set(const TableName *tab_name,
          uint64_t schema_version,
          std::vector<ScanBatchTuple> &batch_read_pri,
          bool is_for_write = false,
          bool is_for_share = false,
          bool read_local = false,
          bool point_read_on_cache_miss = false,
-         uint64_t corresponding_sk_commit_ts = 0)
+         uint64_t corresponding_sk_commit_ts = 0,
+         bool is_catalog_batch = false)
 {
     tab_name_ = tab_name;
     read_batch_ = std::move(batch_read_pri);
     is_for_write_ = is_for_write;
     is_for_share_ = is_for_share;
     read_local_ = read_local;
     point_read_on_cache_miss_ = point_read_on_cache_miss;
     corresponding_sk_commit_ts_ = corresponding_sk_commit_ts;
     schema_version_ = schema_version;
     local_cache_checked_ = false;
+    is_catalog_batch_ = is_catalog_batch;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/tx_request.h` around lines 1346 - 1364, The Set(...)
method for the request object (function Set in tx_request.h) resets
local_cache_checked_ but fails to reset is_catalog_batch_, allowing a previous
catalog batch flag to leak when the object is reused; update Set(...) to
explicitly reset or set is_catalog_batch_ (e.g., is_catalog_batch_ = false or
set based on tab_name/schema) alongside the other member assignments so each
call initializes is_catalog_batch_ correctly.
🧹 Nitpick comments (2)
store_handler/data_store_service_client.cpp (1)

979-1011: Consider bounding scan fan-out to avoid large RPC bursts.

At Line 979, this dispatches one ScanNext per partition immediately. If partition count is high, this can spike RPC pressure and callback contention. A bounded in-flight window (semaphore/worker queue) would keep latency more stable under load.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@store_handler/data_store_service_client.cpp` around lines 979 - 1011, The
loop currently issues a ScanNext for every partition unbounded, causing large
RPC fan-out; limit concurrent in-flight ScanNext calls by introducing a bounded
window (e.g., semaphore or fixed-size worker queue) when iterating
total_partitions and dispatching ScanNext on fetch_cc->partition_scan_states_.
Use a concurrency cap constant (e.g., max_in_flight_scans) and acquire before
calling ScanNext (for symbols: ScanNext, FetchTableRangesCallback,
partition_scan_states_, GetShardIdByPartitionId, fetch_cc), and release when the
corresponding FetchTableRangesCallback completes or via a future/promise
callback so new partitions are dispatched only as slots free up to keep RPC
bursts bounded.
store_handler/data_store_service_client_closure.cpp (1)

881-894: Consider adding a clarifying comment for the synthetic default range.

When all partitions complete with empty results, a synthetic default range is inserted with kv_part_id=0. While functionally correct (the ddl_skip_kv_ comment helps), consider adding a brief comment explaining why kv_part_id=0 is used as the bucket key for this placeholder.

📝 Suggested documentation improvement
             if (fetch_range_cc->EmptyRanges())
             {
                 // When ddl_skip_kv_ is enabled and the range entry is not
                 // physically ready, initializes the original range from
                 // negative infinity to positive infinity.
                 std::vector<txservice::InitRangeEntry> default_ranges;
                 default_ranges.emplace_back(
                     catalog_factory->NegativeInfKey(),
                     txservice::Sequences::InitialRangePartitionIdOf(
                         fetch_range_cc->table_name_),
                     1);
-                // Use kv_part_id 0 for the synthetic default range bucket.
+                // Use kv_part_id 0 as the bucket key for the synthetic default
+                // range. This is arbitrary since there's only one default entry.
                 fetch_range_cc->AppendTableRanges(0, std::move(default_ranges));
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@store_handler/data_store_service_client_closure.cpp` around lines 881 - 894,
Add a short clarifying comment immediately above the AppendTableRanges(0, ...)
call in the if (fetch_range_cc->EmptyRanges()) block explaining that
kv_part_id=0 is chosen as a sentinel/reserved bucket id for the synthetic
default range (so it does not collide with real partition ids and ensures
downstream bucket lookups treat this as a placeholder), and reference that this
behavior is intentional when ddl_skip_kv_ is enabled and the original range is
initialized from NegativeInfKey() and
txservice::Sequences::InitialRangePartitionIdOf(fetch_range_cc->table_name_);
keep the comment concise and colocated with the AppendTableRanges call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@store_handler/data_store_service_client_closure.cpp`:
- Around line 860-861: Multiple concurrent callbacks can call
fetch_range_cc->AppendTableRanges with the same kv_part_id and race on
partition_ranges_vec_[kv_part_id]; make AppendTableRanges thread-safe by
synchronizing access to partition_ranges_vec_ (e.g., add a mutex for the whole
vector or a per-partition mutex) and acquire the lock inside AppendTableRanges
before pushing/mutating the inner vector, or alternatively switch
partition_ranges_vec_ to a concurrent container; update any callers
(ScanNext/ScanNextClosure paths) to rely on the synchronized AppendTableRanges
so concurrent ScanNext callbacks no longer race.

In `@tx_service/src/tx_execution.cpp`:
- Around line 7590-7597: The catalog branch accesses
batch_read_op.hd_result_vec_[i] without checking its size; add the same safety
used in the non-catalog path by guarding that
batch_read_op.hd_result_vec_.size() is at least N (or adjust the loop to iterate
to std::min(N, batch_read_op.hd_result_vec_.size())) before any indexed access
to hd_result_vec_, and handle the shortfall (e.g., skip catalog-specific logic
or log/return an error) so accessing hd_result_vec_[i] cannot go out-of-bounds;
update the code around read_batch, N, and the loop that uses hd_result_vec_ to
implement this check.

---

Outside diff comments:
In `@tx_service/include/tx_request.h`:
- Around line 1346-1364: The Set(...) method for the request object (function
Set in tx_request.h) resets local_cache_checked_ but fails to reset
is_catalog_batch_, allowing a previous catalog batch flag to leak when the
object is reused; update Set(...) to explicitly reset or set is_catalog_batch_
(e.g., is_catalog_batch_ = false or set based on tab_name/schema) alongside the
other member assignments so each call initializes is_catalog_batch_ correctly.

---

Nitpick comments:
In `@store_handler/data_store_service_client_closure.cpp`:
- Around line 881-894: Add a short clarifying comment immediately above the
AppendTableRanges(0, ...) call in the if (fetch_range_cc->EmptyRanges()) block
explaining that kv_part_id=0 is chosen as a sentinel/reserved bucket id for the
synthetic default range (so it does not collide with real partition ids and
ensures downstream bucket lookups treat this as a placeholder), and reference
that this behavior is intentional when ddl_skip_kv_ is enabled and the original
range is initialized from NegativeInfKey() and
txservice::Sequences::InitialRangePartitionIdOf(fetch_range_cc->table_name_);
keep the comment concise and colocated with the AppendTableRanges call.

In `@store_handler/data_store_service_client.cpp`:
- Around line 979-1011: The loop currently issues a ScanNext for every partition
unbounded, causing large RPC fan-out; limit concurrent in-flight ScanNext calls
by introducing a bounded window (e.g., semaphore or fixed-size worker queue)
when iterating total_partitions and dispatching ScanNext on
fetch_cc->partition_scan_states_. Use a concurrency cap constant (e.g.,
max_in_flight_scans) and acquire before calling ScanNext (for symbols: ScanNext,
FetchTableRangesCallback, partition_scan_states_, GetShardIdByPartitionId,
fetch_cc), and release when the corresponding FetchTableRangesCallback completes
or via a future/promise callback so new partitions are dispatched only as slots
free up to keep RPC bursts bounded.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 962933b and 7a52f73.

📒 Files selected for processing (7)
  • store_handler/data_store_service_client.cpp
  • store_handler/data_store_service_client_closure.cpp
  • tx_service/include/cc/cc_req_misc.h
  • tx_service/include/tx_request.h
  • tx_service/include/tx_util.h
  • tx_service/src/cc/cc_req_misc.cpp
  • tx_service/src/tx_execution.cpp
💤 Files with no reviewable changes (2)
  • tx_service/src/cc/cc_req_misc.cpp
  • tx_service/include/tx_util.h

Comment on lines +860 to +861
const int32_t kv_part_id = scan_next_closure->PartitionId();
fetch_range_cc->AppendTableRanges(kv_part_id, std::move(range_vec));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for AppendTableRanges definition and usage across the codebase
rg -n 'AppendTableRanges' --type cpp --type h -A 10 -B 2

Repository: eloqdata/tx_service

Length of output: 4231


🏁 Script executed:

# Also search for the class/struct that contains AppendTableRanges
rg -n 'class.*fetch_range|struct.*fetch_range' --type cpp --type h

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Look at the broader context of the file to understand the callback execution model
head -n 20 store_handler/data_store_service_client_closure.cpp
wc -l store_handler/data_store_service_client_closure.cpp

Repository: eloqdata/tx_service

Length of output: 1020


🏁 Script executed:

# Search for partition_ranges_vec_ declaration in the header and cpp files
rg -n 'partition_ranges_vec_' --type cpp --type h -B 3 -A 3

Repository: eloqdata/tx_service

Length of output: 5133


🏁 Script executed:

# Look for the FetchTableRangesCc class definition to understand data member protection
rg -n 'class FetchTableRangesCc' --type h -A 50

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Search for how scan callbacks are created and if multiple callbacks can handle the same partition
rg -n 'partition_scan_states_' --type cpp --type h -B 2 -A 5

Repository: eloqdata/tx_service

Length of output: 3015


🏁 Script executed:

# Look for the callback spawning logic and partition assignment
rg -n 'scan_next_closure\|PartitionScanState' --type cpp --type h | head -40

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Check if there's any synchronization around partition_ranges_vec_ at the class level
rg -n 'class FetchTableRangesCc' --type h -A 100 | head -120

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Search for where callbacks are created and understand the execution model
rg -n 'scan_next_closure\|AsyncScan\|bthread::start_background' --type cpp -B 3 -A 3 | head -60

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Look at the structure to understand if partition is locked while scanning
rg -n 'remaining_partitions\|PartitionScanState' --type cpp -B 2 -A 2 | head -50

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Check if partition_ranges_vec_ is ever accessed from multiple threads for the same index
grep -n 'for.*partition' store_handler/data_store_service_client_closure.cpp | head -20

Repository: eloqdata/tx_service

Length of output: 118


🏁 Script executed:

# Search for where the callbacks are spawned/created to understand partition assignment
rg -n 'for.*total_partitions\|spawn\|start_background' --type cpp store_handler/data_store_service_client.cpp -A 5 | head -80

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Look at the complete context around lines 860-870 to understand callback execution model
sed -n '830,920p' store_handler/data_store_service_client_closure.cpp

Repository: eloqdata/tx_service

Length of output: 3506


🏁 Script executed:

# Search for the callback class definition
rg -n 'class.*Closure\|struct.*Closure' --type h store_handler/ | grep -i scan

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Search for how scan callbacks are initiated and the execution pattern
rg -n 'ScanNext\|remaining_partitions_' store_handler/data_store_service_client.cpp -B 3 -A 3 | head -80

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Check if a partition can be processed by multiple concurrent callbacks
rg -n 'PartitionScanState\|remaining_partitions_' tx_service/include/cc/cc_req_misc.h -B 5 -A 5

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Understand the full class definition
sed -n '200,250p' tx_service/include/cc/cc_req_misc.h

Repository: eloqdata/tx_service

Length of output: 1637


🏁 Script executed:

# Look at the initialization and understand how many callbacks per partition
sed -n '975,1010p' store_handler/data_store_service_client.cpp

Repository: eloqdata/tx_service

Length of output: 1306


🏁 Script executed:

# Check how the callbacks are spawned in a loop to understand partition-to-callback mapping
rg -n 'for.*remaining_partitions\|for.*total_partitions' store_handler/ --type cpp -A 10 | head -50

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Search for the callback spawn location and loop structure
rg -n 'ScanNext.*kv_part_id\|for.*kv_part_id' store_handler/ --type cpp -B 2 -A 5 | head -60

Repository: eloqdata/tx_service

Length of output: 45


🏁 Script executed:

# Confirm the full pattern - are callbacks spawned sequentially in a loop for each partition?
sed -n '960,1020p' store_handler/data_store_service_client.cpp

Repository: eloqdata/tx_service

Length of output: 2215


🏁 Script executed:

# Double-check: does the callback handler receive the partition ID and only process that partition?
sed -n '820,870p' store_handler/data_store_service_client_closure.cpp

Repository: eloqdata/tx_service

Length of output: 1926


Data race: multiple callbacks can concurrently call AppendTableRanges with the same partition ID.

When a callback's batch is not complete, it spawns another ScanNext for the same partition (line 915), creating concurrent callbacks. Both can then call AppendTableRanges with the same kv_part_id, accessing partition_ranges_vec_[kv_part_id] simultaneously without synchronization, causing a data race on the inner vector's push_back.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@store_handler/data_store_service_client_closure.cpp` around lines 860 - 861,
Multiple concurrent callbacks can call fetch_range_cc->AppendTableRanges with
the same kv_part_id and race on partition_ranges_vec_[kv_part_id]; make
AppendTableRanges thread-safe by synchronizing access to partition_ranges_vec_
(e.g., add a mutex for the whole vector or a per-partition mutex) and acquire
the lock inside AppendTableRanges before pushing/mutating the inner vector, or
alternatively switch partition_ranges_vec_ to a concurrent container; update any
callers (ScanNext/ScanNextClosure paths) to rely on the synchronized
AppendTableRanges so concurrent ScanNext callbacks no longer race.

Comment on lines +7590 to +7597
std::vector<txservice::ScanBatchTuple> &read_batch =
batch_read_op.batch_read_tx_req_->read_batch_;
const size_t N = read_batch.size();
for (size_t i = 0; i < N; ++i)
{
TxKey &key = read_batch[i].key_;
TxRecord &rec = *read_batch[i].record_;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard hd_result_vec_ size before indexed access in the catalog batch path.

This branch indexes batch_read_op.hd_result_vec_[i] without the size invariant check that exists in the non-catalog path. If sizes diverge, this can become out-of-bounds access.

💡 Proposed fix
     std::vector<txservice::ScanBatchTuple> &read_batch =
         batch_read_op.batch_read_tx_req_->read_batch_;
+    assert(batch_read_op.hd_result_vec_.size() == read_batch.size());
     const size_t N = read_batch.size();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::vector<txservice::ScanBatchTuple> &read_batch =
batch_read_op.batch_read_tx_req_->read_batch_;
const size_t N = read_batch.size();
for (size_t i = 0; i < N; ++i)
{
TxKey &key = read_batch[i].key_;
TxRecord &rec = *read_batch[i].record_;
std::vector<txservice::ScanBatchTuple> &read_batch =
batch_read_op.batch_read_tx_req_->read_batch_;
assert(batch_read_op.hd_result_vec_.size() == read_batch.size());
const size_t N = read_batch.size();
for (size_t i = 0; i < N; ++i)
{
TxKey &key = read_batch[i].key_;
TxRecord &rec = *read_batch[i].record_;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/src/tx_execution.cpp` around lines 7590 - 7597, The catalog branch
accesses batch_read_op.hd_result_vec_[i] without checking its size; add the same
safety used in the non-catalog path by guarding that
batch_read_op.hd_result_vec_.size() is at least N (or adjust the loop to iterate
to std::min(N, batch_read_op.hd_result_vec_.size())) before any indexed access
to hd_result_vec_, and handle the shortfall (e.g., skip catalog-specific logic
or log/return an error) so accessing hd_result_vec_[i] cannot go out-of-bounds;
update the code around read_batch, N, and the loop that uses hd_result_vec_ to
implement this check.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves metadata access performance by (1) fetching table range information from the data store concurrently across KV partitions and (2) adding a catalog-specific batch read path that uses ReadLocal on catalog_ccm_name.

Changes:

  • Add is_catalog_batch_ to BatchReadTxRequest and route catalog batch reads through a ReadLocal(catalog_ccm_name, ...) fast path.
  • Refactor FetchTableRanges to scan all range-slice KV partitions in parallel and coordinate completion via shared state (remaining_partitions_, finish_mux_).
  • Minor whitespace cleanup in a couple of CC/utility files.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tx_service/src/tx_execution.cpp Adds catalog batch-read fast path and adjusts post-processing to treat catalog batches differently.
tx_service/src/cc/cc_req_misc.cpp Whitespace-only cleanup.
tx_service/include/tx_util.h Whitespace-only cleanup.
tx_service/include/tx_request.h Extends BatchReadTxRequest with is_catalog_batch_.
tx_service/include/cc/cc_req_misc.h Updates FetchTableRangesCc to hold per-partition scan state and synchronization fields.
store_handler/data_store_service_client_closure.cpp Updates FetchTableRangesCallback to aggregate concurrent partition scans and finish once all partitions complete.
store_handler/data_store_service_client.cpp Starts FetchTableRanges scans concurrently across all KV partitions and initializes shared completion state.
Comments suppressed due to low confidence (2)

tx_service/include/tx_request.h:1375

  • BatchReadTxRequest now adds is_catalog_batch_, but Set(...) does not reset/initialize this new flag. If a BatchReadTxRequest instance is reused via Set, it can retain a stale is_catalog_batch_ value and take the wrong execution path. Consider extending Set(...) to explicitly set/reset is_catalog_batch_ (and/or document that the request must not be reused across catalog vs non-catalog batches).
    bool point_read_on_cache_miss_;
    uint64_t corresponding_sk_commit_ts_;
    uint64_t schema_version_;
    bool local_cache_checked_;
    bool is_catalog_batch_;  // true: batch ReadLocal on catalog_ccm_name

tx_service/src/tx_execution.cpp:7608

  • In the is_catalog_batch_ fast-path, the loop indexes batch_read_op.hd_result_vec_[i] without a size check/assert like the non-catalog path has later (assert(hd_result_vec_.size() == read_batch.size())). If read_batch_ is resized after Reset() (it’s a non-const reference in the request), this can become an out-of-bounds access. Add an early assert/check that hd_result_vec_.size() matches read_batch.size() before the loop (or iterate using hd_result_vec_.size()).
        batch_read_op.is_running_ = true;
        std::vector<txservice::ScanBatchTuple> &read_batch =
            batch_read_op.batch_read_tx_req_->read_batch_;
        const size_t N = read_batch.size();
        for (size_t i = 0; i < N; ++i)
        {
            TxKey &key = read_batch[i].key_;
            TxRecord &rec = *read_batch[i].record_;

            bool finished = cc_handler_->ReadLocal(
                catalog_ccm_name,
                key,
                rec,
                ReadType::Inside,
                tx_number_.load(std::memory_order_relaxed),
                tx_term_,
                CommandId(),
                start_ts_,
                batch_read_op.hd_result_vec_[i],
                IsolationLevel::RepeatableRead,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lokax lokax merged commit 140d611 into main Feb 28, 2026
8 checks passed
lokax added a commit that referenced this pull request Mar 6, 2026
lokax added a commit that referenced this pull request Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants