Skip to content

enhancement: tune hash partition checkpoint params, increase ckpt scan buffer, fix hash ckpt partition calc, and limit min data sync scan concurrency#347

Merged
thweetkomputer merged 9 commits into
mainfrom
ckpt_zc
Jan 8, 2026
Merged

enhancement: tune hash partition checkpoint params, increase ckpt scan buffer, fix hash ckpt partition calc, and limit min data sync scan concurrency#347
thweetkomputer merged 9 commits into
mainfrom
ckpt_zc

Conversation

@thweetkomputer

@thweetkomputer thweetkomputer commented Jan 8, 2026

Copy link
Copy Markdown
Collaborator

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

  • New Features

    • Data synchronization and checkpoint scanning parameters are now configurable through runtime flags, including batch size, data size, and scan yield timing.
  • Chores

    • Changed default data store implementation from RocksDB to EloqStore.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Jan 8, 2026

Copy link
Copy Markdown

Walkthrough

This PR converts hard-coded constants into runtime-configurable FLAGS for hash partition data synchronization and checkpoint scanning, introduces per-core partition distribution logic for data sync operations, and changes the default data store implementation from RocksDB to EloqStore.

Changes

Cohort / File(s) Summary
Configuration Flag Declarations
tx_service/include/cc/cc_request.h, tx_service/include/cc/template_cc_map.h, tx_service/src/cc/cc_shard.cpp
Declared public FLAGS for batch sizing (hash_partition_data_sync_scan_batch_size), data sizing (hash_partition_data_sync_scan_data_size), yield timing (hash_partition_ckpt_scan_yield_time_us), and buffer ratio (ckpt_buffer_ratio); removed corresponding static constants.
Hash Partition Data Sync Implementation
tx_service/src/cc/local_cc_shards.cpp
Defined runtime FLAGS, implemented per-core partition distribution logic with per-core scan sizing calculations, added conditional compilation paths for Eloq store vs non-Eloq store scan concurrency computation, updated memory initialization to use FLAGS_ckpt_buffer_ratio.
Build Configuration
CMakeLists.txt
Changed default WITH_DATA_STORE value from ELOQDSS_ROCKSDB to ELOQDSS_ELOQSTORE.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • liunyl
  • MrGuin
  • xiexiaoy

Poem

🐰 Constants shed their static chains,
Flags now govern data veins,
Cores dance in partitions true,
EloqStore's the default brew,
Scan and sync in sync anew!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description is a bare template with all checklist items unchecked and no actual implementation details, making it incomplete and insufficient. Fill in the PR description with actual details about changes made, include issue/RFC references, and mark completed checklist items. Add summary of what was changed and why.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title comprehensively captures the main changes: tuning hash partition checkpoint parameters, increasing ckpt scan buffer, fixing hash ckpt partition calculation, and limiting min data sync scan concurrency.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ckpt_zc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tx_service/src/cc/local_cc_shards.cpp (1)

4741-4802: Fix divide-by-zero and tighten scan_concurrency in Eloq-store hash-partition path

In DataSyncForHashPartition, the Eloq-store branch computes:

const size_t scan_concurrency =
    flush_buffer_size /
    (default_data_file_size * partition_number_this_core);
const size_t partition_number_per_scan = partition_number_this_core;

There are two problems here:

  1. If partition_number_this_core == 0 (e.g., more worker cores than total_hash_partitions), this is a divide-by-zero and immediate UB.
  2. scan_concurrency can grow unbounded relative to core_number, which is unnecessary since the gating logic in DataSyncWorker is per-core.

A safer formulation is to explicitly handle the zero-partition case and clamp concurrency into [1, core_number] when there is at least one partition:

Proposed fix for Eloq-store scan_concurrency computation
-    constexpr size_t default_data_file_size = 8ULL * 1024 * 1024;
-    const size_t scan_concurrency =
-        flush_buffer_size /
-        (default_data_file_size * partition_number_this_core);
-    const size_t partition_number_per_scan = partition_number_this_core;
+    constexpr size_t default_data_file_size = 8ULL * 1024 * 1024;
+    size_t scan_concurrency = 0;
+    const size_t partition_number_per_scan = partition_number_this_core;
+    if (partition_number_this_core > 0 && flush_buffer_size > 0)
+    {
+        const size_t denom =
+            default_data_file_size * partition_number_this_core;
+        // denom > 0 here, but keep the branch explicit for clarity.
+        if (denom > 0)
+        {
+            const size_t target = flush_buffer_size / denom;
+            // At least one concurrent scan when we have partitions,
+            // and never more than the number of cores.
+            scan_concurrency =
+                std::min(core_number, std::max<size_t>(1, target));
+        }
+    }

This avoids division by zero when a core owns no partitions (the for-loop below is already a no-op in that case) and keeps the concurrency hint consistent with the actual number of workers.

🤖 Fix all issues with AI agents
In @tx_service/src/cc/cc_shard.cpp:
- Around line 112-114: Compute memory_usage_ratio = 1.0 -
range_slice_reserve_ratio - FLAGS_ckpt_buffer_ratio in the CCShard constructor
and validate it immediately: if memory_usage_ratio <= 0, emit a clear error (or
DCHECK/LOG(FATAL) per project conventions) referencing the offending flags
(range_slice_reserve_ratio and FLAGS_ckpt_buffer_ratio) and either abort
construction or clamp memory_usage_ratio to a small positive value before using
it to compute memory_limit_; ensure memory_limit_ is only set when
memory_usage_ratio > 0 so later comparisons using memory_limit_ are not
zero/negative.
🧹 Nitpick comments (6)
core/src/data_substrate.cpp (1)

76-91: Consider adding validation for flag values to prevent misconfiguration.

The newly introduced flags lack validation to ensure values are within reasonable bounds:

  • ckpt_buffer_ratio should be constrained to [0.0, 1.0] and ideally the sum of ckpt_buffer_ratio + range_slice_reserve_ratio should not exceed 1.0 (validated at usage site in cc_shard.cpp line 114)
  • ckpt_scan_yield_time_us, data_sync_scan_batch_size, and data_sync_scan_data_size could benefit from sanity checks to prevent extremely large values that might cause resource exhaustion

While the defaults are reasonable, runtime misconfiguration via command-line flags could lead to undefined behavior (e.g., negative memory_usage_ratio in CcShard constructor).

🛡️ Example validation approach

Consider adding a validation function called during startup:

bool ValidateCheckpointFlags() {
    if (FLAGS_ckpt_buffer_ratio < 0.0 || FLAGS_ckpt_buffer_ratio > 1.0) {
        LOG(ERROR) << "ckpt_buffer_ratio must be between 0.0 and 1.0, got: " 
                   << FLAGS_ckpt_buffer_ratio;
        return false;
    }
    if (FLAGS_data_sync_scan_batch_size == 0 || FLAGS_data_sync_scan_batch_size > 10000) {
        LOG(WARNING) << "data_sync_scan_batch_size seems unusual: " 
                     << FLAGS_data_sync_scan_batch_size;
    }
    // Similar checks for other flags
    return true;
}

Alternatively, use gflags validators:

static bool ValidateCkptBufferRatio(const char* flagname, double value) {
    if (value >= 0.0 && value <= 1.0) return true;
    LOG(ERROR) << flagname << " must be between 0.0 and 1.0";
    return false;
}
DEFINE_validator(ckpt_buffer_ratio, &ValidateCkptBufferRatio);
tx_service/src/cc/cc_shard.cpp (1)

112-112: Update comment to avoid hardcoding the percentage.

The comment "12.5% for data sync memory usage" hardcodes the default value of FLAGS_ckpt_buffer_ratio, which could become misleading if the flag is changed via configuration or command-line.

♻️ Suggested improvement
-// 12.5% for data sync memory usage
+// Reserve ckpt_buffer_ratio fraction for data sync memory usage
 double memory_usage_ratio =
     1.0 - range_slice_reserve_ratio - FLAGS_ckpt_buffer_ratio;

Or make it more informative:

-// 12.5% for data sync memory usage
+// Reserve memory for checkpoint flush buffers and data-sync controller
+// (default 12.5%, configurable via --ckpt_buffer_ratio)
 double memory_usage_ratio =
     1.0 - range_slice_reserve_ratio - FLAGS_ckpt_buffer_ratio;
tx_service/include/cc/template_cc_map.h (2)

73-74: gflags DECLARE in a widely-included header

Declaring ckpt_scan_yield_time_us here is functionally fine, but it makes this large template header depend on gflags macros indirectly. Consider centralizing all DECLARE_* for these scan/ckpt flags into a dedicated config header (already included here) to avoid spreading flag declarations across multiple headers.


5903-5914: Flag-driven scan limits look correct; guard against zero-valued flags

The new guards on scan_cnt, export_data_size, and wall-clock delta via FLAGS_data_sync_scan_batch_size, FLAGS_data_sync_scan_data_size, and FLAGS_ckpt_scan_yield_time_us are consistent with the existing control flow and should make ckpt/data-sync behavior tunable per PR intent.

One edge case: if FLAGS_data_sync_scan_batch_size == 0 or FLAGS_data_sync_scan_data_size == 0, this loop will never execute and the request can reschedule indefinitely without making progress. Similarly, a FLAGS_ckpt_scan_yield_time_us of 0 will force almost immediate yields. It may be worth clamping these flags to sane minimums at definition time (or asserting here) to avoid misconfiguration leading to stalled or overly-throttled scans.

tx_service/include/cc/cc_request.h (1)

3747-3753: Assert now tied to FLAGS_data_sync_scan_batch_size; confirm invariant still holds

The ctor now asserts scan_batch_size_ > FLAGS_data_sync_scan_batch_size (previously compared against the class constant). This couples the constructor argument to a runtime flag: if someone configures data_sync_scan_batch_size to be ≥ the passed scan_batch_size, debug builds will trip the assert.

Please verify:

  • All call sites pass scan_batch_size values that are guaranteed to remain strictly larger than FLAGS_data_sync_scan_batch_size for any supported configuration.
  • If you expect data_sync_scan_batch_size to be changed at runtime, consider either validating this relationship where the flag is set or relaxing the condition (e.g., >=) plus logging instead of a hard assert.
tx_service/src/cc/local_cc_shards.cpp (1)

68-72: ckpt_buffer_ratio-based sizing looks good, but guard against negative effective ratios

Using FLAGS_ckpt_buffer_ratio to size both cur_flush_buffer_ and data_sync_mem_controller_ matches the PR intent and preserves the old 5% / 7.5% relationship when the default ratio is 0.075. However, if an operator ever sets ckpt_buffer_ratio <= 0.025, the expression

MB(conf.at("node_memory_limit_mb")) * (FLAGS_ckpt_buffer_ratio - 0.025)

becomes negative before the static_cast<uint64_t>, which will wrap to a very large positive value and effectively remove your memory cap.

Consider clamping or validating the flag so the effective ratios are non‑negative (and within a sane upper bound, e.g. ≤1). A simple option is to validate the flag at definition time (in the DEFINE_double site) or clamp locally before computing the sizes.

Also applies to: 132-136

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8d63f6 and 964a068.

📒 Files selected for processing (5)
  • core/src/data_substrate.cpp
  • tx_service/include/cc/cc_request.h
  • tx_service/include/cc/template_cc_map.h
  • tx_service/src/cc/cc_shard.cpp
  • tx_service/src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Applied to files:

  • tx_service/src/cc/cc_shard.cpp
  • tx_service/src/cc/local_cc_shards.cpp
  • tx_service/include/cc/cc_request.h
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Applied to files:

  • tx_service/include/cc/template_cc_map.h
  • tx_service/src/cc/local_cc_shards.cpp
  • tx_service/include/cc/cc_request.h
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Applied to files:

  • tx_service/include/cc/template_cc_map.h
  • tx_service/include/cc/cc_request.h
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
🧬 Code graph analysis (1)
tx_service/include/cc/template_cc_map.h (1)
tx_service/include/util.h (1)
  • ReadTimeMicroseconds (580-619)
🔇 Additional comments (2)
tx_service/src/cc/cc_shard.cpp (1)

58-58: LGTM - Flag declaration is correct.

The declaration of ckpt_buffer_ratio follows the established pattern and is necessary to consume the flag defined in data_substrate.cpp.

tx_service/include/cc/cc_request.h (1)

88-89: Verify gflags declaration inclusion for new DECLARE_uint64 flags

The new DECLARE_uint64(data_sync_scan_batch_size) / DECLARE_uint64(data_sync_scan_data_size) rely on the gflags declaration macros being visible. Please double‑check that this header (or a consistently included ancestor such as tx_service_common.h or similar) includes the appropriate gflags declaration header (e.g., gflags/gflags_declare.h or gflags/gflags.h), otherwise TU’s that include cc_request.h in isolation will fail to compile.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tx_service/include/cc/cc_request.h (1)

3722-3751: Compilation error: assert references non-existent gflag

The HashPartitionDataSyncScanCc constructor's assert uses FLAGS_data_sync_scan_batch_size, which does not exist in the codebase. The only declared flag for this code path is FLAGS_hash_partition_data_sync_scan_batch_size (declared at line 88).

Replace the assert with:

assert(scan_batch_size_ > FLAGS_hash_partition_data_sync_scan_batch_size);
tx_service/src/cc/local_cc_shards.cpp (1)

4741-4801: Fix division-by-zero and incorrect partition filtering in DataSyncForHashPartition

Two critical correctness issues exist in the per-core hash-partition logic:

  1. Division-by-zero crash on systems with >1024 CPU cores (ELOQSTORE builds)
    When core_number > total_hash_partitions (1024), workers with high indices receive zero partitions:

    partition_number_this_core = 0

    The ELOQSTORE path at line 4785 then crashes:

    flush_buffer_size / (default_data_file_size * partition_number_this_core)  // divides by 0

    This is a valid scenario on modern high-core systems. The non-ELOQSTORE path guards this correctly, but ELOQSTORE does not.

  2. Filter incorrectly includes partitions owned by other workers
    Partitions for each worker are non-contiguous: {worker_idx + core_number*0, worker_idx + core_number*1, ...}. For example, with total_hash_partitions=10, core_number=3, worker_idx=0, this worker owns {0, 3, 6, 9}.

    The current filter uses a numeric range check at lines 4836-4837:

    return static_cast<size_t>(part_id) >= min_partition_id_this_scan &&
           static_cast<size_t>(part_id) <= max_partition_id_this_scan

    With partition_number_per_scan=2, the first batch range [0, 3] incorrectly admits partitions 1 and 2, which are owned by other workers. This breaks partitioning invariants and can cause duplicate scans/flushes.

Concrete fixes required:

  • Guard ELOQSTORE division with std::max(1, partition_number_this_core) to prevent crashes.
  • Replace numeric range filter with modular arithmetic: verify (partition_id - worker_idx) % core_number == 0 to ensure the partition belongs to this worker, then check if its local index is in the current batch range.
🤖 Fix all issues with AI agents
In @CMakeLists.txt:
- Line 9: The CMake default for WITH_DATA_STORE was changed to ELOQDSS_ELOQSTORE
which can silently alter builds; update CI/build scripts and deployment configs
to explicitly set -DWITH_DATA_STORE=ELOQDSS_ROCKSDB (or the desired variant)
where RocksDB is required, audit other CMake entry points (e.g.,
store_handler/eloq_data_store_service/CMakeLists.txt which defaults to
ELOQDSS_ROCKSDB_CLOUD_S3) and align their defaults or override them in calling
scripts, and add/extend migration documentation and team notifications
explaining the new default, recommended overrides, and any required
configuration changes.

In @core/src/data_substrate.cpp:
- Around line 76-91: The ckpt_buffer_ratio flag is ambiguous and can produce
negative internal values when used as (FLAGS_ckpt_buffer_ratio - 0.025) in
LocalCcShards; validate and clamp FLAGS_ckpt_buffer_ratio at startup to a safe
range (e.g., >0.05 and <=0.5) before any use, update the flag help string to
document the split (e.g., "total fraction; flush_buffer = ratio - 0.025,
controller = ratio"), and change all consumers (notably LocalCcShards where
(FLAGS_ckpt_buffer_ratio - 0.025) is used) to use the clamped/validated value so
the subtraction cannot underflow or be cast to a huge uint64_t.

In @tx_service/include/cc/cc_request.h:
- Around line 88-90: The assertion referencing the undefined flag should be
updated to use the correct flag name: replace the use of
FLAGS_data_sync_scan_batch_size with
FLAGS_hash_partition_data_sync_scan_batch_size in the assertion that checks
scan_batch_size_ (look for the assert involving scan_batch_size_ in
cc_request.h); ensure the condition reads against
FLAGS_hash_partition_data_sync_scan_batch_size so it matches the declared flags
DECLARE_uint64(hash_partition_data_sync_scan_batch_size) and
DECLARE_uint64(hash_partition_data_sync_scan_data_size).

In @tx_service/src/cc/local_cc_shards.cpp:
- Around line 132-137: The constructor init-list computes cur_flush_buffer_ and
data_sync_mem_controller_ using (FLAGS_ckpt_buffer_ratio - 0.025) which can be
negative and underflow when cast to uint64_t; clamp the ratio first (e.g.,
compute a local double clamped_ratio = std::max(0.0, FLAGS_ckpt_buffer_ratio);
or ensure clamped_ratio >= 0.025 depending on intended semantics) using
FLAGS_ckpt_buffer_ratio and use that single clamped_ratio when computing both
cur_flush_buffer_ and data_sync_mem_controller_ (still multiplying by
MB(conf.at("node_memory_limit_mb"))) so both consumers derive from the safe,
documented value and avoid negative-to-uint64_t underflow.
🧹 Nitpick comments (1)
tx_service/include/cc/template_cc_map.h (1)

5902-5907: Hash-partition scan loop now bounded by tunable batch size, data size, and yield time

The new guards:

  • scan_cnt < FLAGS_hash_partition_data_sync_scan_batch_size
  • export_data_size < FLAGS_hash_partition_data_sync_scan_data_size
  • time-based break using FLAGS_hash_partition_ckpt_scan_yield_time_us

correctly cap per-iteration work and bytes while preserving existing req.accumulated_scan_cnt_ and iterator termination checks. Behavior looks sound.

One edge case to consider: if hash_partition_data_sync_scan_data_size is configured as 0, the loop will never process any entries. If that’s not desired, treating 0 (or very small values) as “no limit” or enforcing a minimum >0 at flag parsing time might avoid surprising misconfigurations.

Also applies to: 5912-5915

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 964a068 and a72fa3f.

📒 Files selected for processing (5)
  • CMakeLists.txt
  • core/src/data_substrate.cpp
  • tx_service/include/cc/cc_request.h
  • tx_service/include/cc/template_cc_map.h
  • tx_service/src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
  • tx_service/include/cc/cc_request.h
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
  • tx_service/include/cc/cc_request.h
  • tx_service/include/cc/template_cc_map.h
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Applied to files:

  • tx_service/include/cc/cc_request.h
  • tx_service/include/cc/template_cc_map.h
🧬 Code graph analysis (1)
tx_service/include/cc/template_cc_map.h (1)
tx_service/include/util.h (1)
  • ReadTimeMicroseconds (580-619)
🔇 Additional comments (3)
CMakeLists.txt (1)

9-9: Note: Potential disconnect between PR scope and files under review.

The PR title indicates "add parameter for tuning hash partition checkpoint, increase ckpt scan buffer," and the AI summary mentions replacing hard-coded constants with gflags (hash_partition_*, ckpt_buffer_ratio). However, the only visible change in this file is the data store default modification.

If this PR includes additional files with the checkpoint tuning changes, those files are not present in this review. If this CMakeLists.txt change is the complete scope, consider updating the PR title and description to accurately reflect that this is primarily a default data store configuration change.

tx_service/include/cc/template_cc_map.h (1)

73-74: Flag definition is properly wired

The declaration of hash_partition_ckpt_scan_yield_time_us has a matching DEFINE in core/src/data_substrate.cpp with a default value of 100 microseconds. No duplicate definitions exist.

tx_service/src/cc/local_cc_shards.cpp (1)

69-72: New hash-partition gflag declarations look consistent

The new DECLARE_* entries for ckpt_buffer_ratio and hash-partition scan/data-size flags align with the corresponding DEFINE_*s in core/src/data_substrate.cpp and match the existing pattern for DECLARE_bool(report_ckpt). No issues from a declaration/visibility perspective.

Comment thread CMakeLists.txt
Comment thread core/src/data_substrate.cpp Outdated
Comment thread tx_service/include/cc/cc_request.h Outdated
Comment thread tx_service/src/cc/local_cc_shards.cpp

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @tx_service/src/cc/cc_shard.cpp:
- Around line 112-114: Update the comment above the memory calculation to
indicate the data-sync buffer percentage is configurable via
FLAGS_ckpt_buffer_ratio (e.g., "Configurable % for data sync memory via
FLAGS_ckpt_buffer_ratio") and add validation for FLAGS_ckpt_buffer_ratio before
computing memory_usage_ratio in cc_shard.cpp: check that FLAGS_ckpt_buffer_ratio
is within [0.0, 1.0] and handle out-of-range values (either clamp to the valid
range or log an error and use a safe default) so the calculation of
memory_usage_ratio = 1.0 - range_slice_reserve_ratio - FLAGS_ckpt_buffer_ratio
cannot produce invalid results.
- Line 61: Add a gflags validator for the ckpt_buffer_ratio flag so it cannot be
set below 0.025 or above 1.0; register the validator (via
gflags::RegisterFlagValidator) for FLAGS_ckpt_buffer_ratio where the flag is
defined/initialized (replace the current DECLARE_double(ckpt_buffer_ratio) usage
with or accompany it by a RegisterFlagValidator call) and return false if the
value is outside [0.025, 1.0], providing a clear error message; this prevents
negative results in calculations that use FLAGS_ckpt_buffer_ratio (see usages in
local_cc_shards.cpp around the subtraction FLAGS_ckpt_buffer_ratio - 0.025 and
in cc_shard.cpp where 1.0 - range_slice_reserve_ratio - FLAGS_ckpt_buffer_ratio
is computed).
🧹 Nitpick comments (2)
tx_service/include/cc/cc_request.h (1)

3745-3748: Confirm invariant between scan_batch_size_ and FLAGS_hash_partition_data_sync_scan_batch_size

The ctor now asserts:

assert(scan_batch_size_ >
       FLAGS_hash_partition_data_sync_scan_batch_size);

Previously this compared against a fixed constexpr; now the RHS is runtime-tunable. If callers still pass a fixed scan_batch_size (e.g., a constant based on prior defaults), a large flag value can violate this invariant and (with NDEBUG) potentially let downstream code assume a larger working batch than data_sync_vec_ actually holds.

Please double-check that:

  • All call sites compute scan_batch_size from the same flag (or a safe function of it), and
  • There is no path that uses FLAGS_hash_partition_data_sync_scan_batch_size as a loop bound independent of data_sync_vec_.size().

If either is not guaranteed, consider clamping the effective batch size to scan_batch_size_ or relaxing this to a >=/runtime check with logging instead of a debug-only assert.

tx_service/src/cc/cc_shard.cpp (1)

163-165: Consider aligning hard-coded metrics multiplier with dynamic configuration.

The hard-coded 0.9 multiplier for memory_limit_ appears inconsistent with the newly configurable FLAGS_ckpt_buffer_ratio. The comment mentions "10% memory size is reserved for data sync scan," but this percentage is now dynamic. This discrepancy could mislead operators monitoring memory limits.

Consider calculating the metrics value based on the actual configured ratio to provide accurate observability.

♻️ Suggested improvement
         meter_->Register(metrics::NAME_MEMORY_LIMIT, metrics::Type::Gauge);
-        // 10% memory size is reserved for data sync scan
-        meter_->Collect(metrics::NAME_MEMORY_LIMIT, memory_limit_ * 0.9);
+        // Report effective memory limit accounting for data sync reservation
+        meter_->Collect(metrics::NAME_MEMORY_LIMIT, 
+                       memory_limit_ * (1.0 - FLAGS_ckpt_buffer_ratio / memory_usage_ratio));

Note: This assumes you want to report the memory available after accounting for the checkpoint buffer ratio. Adjust the formula based on the intended semantics of the MEMORY_LIMIT metric.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a72fa3f and cf5ba20.

📒 Files selected for processing (4)
  • tx_service/include/cc/cc_request.h
  • tx_service/include/cc/template_cc_map.h
  • tx_service/src/cc/cc_shard.cpp
  • tx_service/src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Applied to files:

  • tx_service/src/cc/cc_shard.cpp
  • tx_service/include/cc/cc_request.h
  • tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

Applied to files:

  • tx_service/src/cc/cc_shard.cpp
  • tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.

Applied to files:

  • tx_service/src/cc/cc_shard.cpp
  • tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Applied to files:

  • tx_service/include/cc/cc_request.h
  • tx_service/src/cc/local_cc_shards.cpp
  • tx_service/include/cc/template_cc_map.h
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Applied to files:

  • tx_service/include/cc/cc_request.h
  • tx_service/include/cc/template_cc_map.h
🧬 Code graph analysis (1)
tx_service/include/cc/template_cc_map.h (1)
tx_service/include/util.h (1)
  • ReadTimeMicroseconds (580-619)
🔇 Additional comments (4)
tx_service/include/cc/cc_request.h (1)

86-90: Hash-partition data-sync FLAGS declarations look appropriate

Declaring hash_partition_data_sync_scan_batch_size and hash_partition_data_sync_scan_data_size as uint64 inside txservice is consistent with their intended use as tunable sizing thresholds; placement before any use in this header is fine. Just ensure the defining TU includes the appropriate gflags header and that defaults remain sane relative to existing batch sizes so tuning cannot accidentally blow up memory usage.

tx_service/include/cc/template_cc_map.h (2)

5899-5914: Configurable pacing for hash-partition ckpt scan looks correct

Using FLAGS_hash_partition_data_sync_scan_batch_size, FLAGS_hash_partition_data_sync_scan_data_size, and the new FLAGS_hash_partition_ckpt_scan_yield_time_us to bound the loop and periodically yield based on ReadTimeMicroseconds() is consistent with existing time-based yield patterns in this file and should help bound per-iteration work without changing semantics.


73-76: No action needed. The DECLARE_uint32(hash_partition_ckpt_scan_yield_time_us) in the header file and DEFINE_uint32(hash_partition_ckpt_scan_yield_time_us) in tx_service/src/cc/local_cc_shards.cpp are both properly placed within namespace txservice (see line 66 of local_cc_shards.cpp). This is consistent with related flags (hash_partition_data_sync_scan_batch_size, hash_partition_data_sync_scan_data_size) which are also defined in the same namespace. No link-time errors will occur.

Likely an incorrect or invalid review comment.

tx_service/src/cc/local_cc_shards.cpp (1)

68-84: Each new gflag is defined exactly once—no ODR violations found

The search confirms that ckpt_buffer_ratio, hash_partition_ckpt_scan_yield_time_us, hash_partition_data_sync_scan_batch_size, and hash_partition_data_sync_scan_data_size are each defined exactly once in the codebase at their intended locations (lines 69–84 in local_cc_shards.cpp). No duplicate DEFINE_* statements exist, so there are no gflags conflicts or ODR issues to resolve.

Comment thread tx_service/src/cc/cc_shard.cpp
Comment thread tx_service/src/cc/cc_shard.cpp
Comment thread tx_service/src/cc/local_cc_shards.cpp
Comment thread tx_service/src/cc/local_cc_shards.cpp Outdated
@thweetkomputer thweetkomputer changed the title enhancement: add parameter for tuning hash partition checkpoint, increase ckpt scan buffer. enhancement: tune hash partition checkpoint params, increase ckpt scan buffer, fix hash ckpt partition calc, and limit min data sync scan concurrency Jan 8, 2026
@thweetkomputer thweetkomputer merged commit e9897a0 into main Jan 8, 2026
4 checks passed
@thweetkomputer thweetkomputer deleted the ckpt_zc branch January 8, 2026 09:29
@coderabbitai coderabbitai Bot mentioned this pull request Jan 30, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants