enhancement: tune hash partition checkpoint params, increase ckpt scan buffer, fix hash ckpt partition calc, and limit min data sync scan concurrency#347
Conversation
WalkthroughThis PR converts hard-coded constants into runtime-configurable FLAGS for hash partition data synchronization and checkpoint scanning, introduces per-core partition distribution logic for data sync operations, and changes the default data store implementation from RocksDB to EloqStore. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tx_service/src/cc/local_cc_shards.cpp (1)
4741-4802: Fix divide-by-zero and tighten scan_concurrency in Eloq-store hash-partition pathIn
DataSyncForHashPartition, the Eloq-store branch computes:const size_t scan_concurrency = flush_buffer_size / (default_data_file_size * partition_number_this_core); const size_t partition_number_per_scan = partition_number_this_core;There are two problems here:
- If
partition_number_this_core == 0(e.g., more worker cores thantotal_hash_partitions), this is a divide-by-zero and immediate UB.scan_concurrencycan grow unbounded relative tocore_number, which is unnecessary since the gating logic inDataSyncWorkeris per-core.A safer formulation is to explicitly handle the zero-partition case and clamp concurrency into
[1, core_number]when there is at least one partition:Proposed fix for Eloq-store scan_concurrency computation
- constexpr size_t default_data_file_size = 8ULL * 1024 * 1024; - const size_t scan_concurrency = - flush_buffer_size / - (default_data_file_size * partition_number_this_core); - const size_t partition_number_per_scan = partition_number_this_core; + constexpr size_t default_data_file_size = 8ULL * 1024 * 1024; + size_t scan_concurrency = 0; + const size_t partition_number_per_scan = partition_number_this_core; + if (partition_number_this_core > 0 && flush_buffer_size > 0) + { + const size_t denom = + default_data_file_size * partition_number_this_core; + // denom > 0 here, but keep the branch explicit for clarity. + if (denom > 0) + { + const size_t target = flush_buffer_size / denom; + // At least one concurrent scan when we have partitions, + // and never more than the number of cores. + scan_concurrency = + std::min(core_number, std::max<size_t>(1, target)); + } + }This avoids division by zero when a core owns no partitions (the for-loop below is already a no-op in that case) and keeps the concurrency hint consistent with the actual number of workers.
🤖 Fix all issues with AI agents
In @tx_service/src/cc/cc_shard.cpp:
- Around line 112-114: Compute memory_usage_ratio = 1.0 -
range_slice_reserve_ratio - FLAGS_ckpt_buffer_ratio in the CCShard constructor
and validate it immediately: if memory_usage_ratio <= 0, emit a clear error (or
DCHECK/LOG(FATAL) per project conventions) referencing the offending flags
(range_slice_reserve_ratio and FLAGS_ckpt_buffer_ratio) and either abort
construction or clamp memory_usage_ratio to a small positive value before using
it to compute memory_limit_; ensure memory_limit_ is only set when
memory_usage_ratio > 0 so later comparisons using memory_limit_ are not
zero/negative.
🧹 Nitpick comments (6)
core/src/data_substrate.cpp (1)
76-91: Consider adding validation for flag values to prevent misconfiguration.The newly introduced flags lack validation to ensure values are within reasonable bounds:
ckpt_buffer_ratioshould be constrained to [0.0, 1.0] and ideally the sum ofckpt_buffer_ratio+range_slice_reserve_ratioshould not exceed 1.0 (validated at usage site incc_shard.cppline 114)ckpt_scan_yield_time_us,data_sync_scan_batch_size, anddata_sync_scan_data_sizecould benefit from sanity checks to prevent extremely large values that might cause resource exhaustionWhile the defaults are reasonable, runtime misconfiguration via command-line flags could lead to undefined behavior (e.g., negative
memory_usage_ratioinCcShardconstructor).🛡️ Example validation approach
Consider adding a validation function called during startup:
bool ValidateCheckpointFlags() { if (FLAGS_ckpt_buffer_ratio < 0.0 || FLAGS_ckpt_buffer_ratio > 1.0) { LOG(ERROR) << "ckpt_buffer_ratio must be between 0.0 and 1.0, got: " << FLAGS_ckpt_buffer_ratio; return false; } if (FLAGS_data_sync_scan_batch_size == 0 || FLAGS_data_sync_scan_batch_size > 10000) { LOG(WARNING) << "data_sync_scan_batch_size seems unusual: " << FLAGS_data_sync_scan_batch_size; } // Similar checks for other flags return true; }Alternatively, use gflags validators:
static bool ValidateCkptBufferRatio(const char* flagname, double value) { if (value >= 0.0 && value <= 1.0) return true; LOG(ERROR) << flagname << " must be between 0.0 and 1.0"; return false; } DEFINE_validator(ckpt_buffer_ratio, &ValidateCkptBufferRatio);tx_service/src/cc/cc_shard.cpp (1)
112-112: Update comment to avoid hardcoding the percentage.The comment "12.5% for data sync memory usage" hardcodes the default value of
FLAGS_ckpt_buffer_ratio, which could become misleading if the flag is changed via configuration or command-line.♻️ Suggested improvement
-// 12.5% for data sync memory usage +// Reserve ckpt_buffer_ratio fraction for data sync memory usage double memory_usage_ratio = 1.0 - range_slice_reserve_ratio - FLAGS_ckpt_buffer_ratio;Or make it more informative:
-// 12.5% for data sync memory usage +// Reserve memory for checkpoint flush buffers and data-sync controller +// (default 12.5%, configurable via --ckpt_buffer_ratio) double memory_usage_ratio = 1.0 - range_slice_reserve_ratio - FLAGS_ckpt_buffer_ratio;tx_service/include/cc/template_cc_map.h (2)
73-74: gflags DECLARE in a widely-included headerDeclaring
ckpt_scan_yield_time_ushere is functionally fine, but it makes this large template header depend on gflags macros indirectly. Consider centralizing allDECLARE_*for these scan/ckpt flags into a dedicated config header (already included here) to avoid spreading flag declarations across multiple headers.
5903-5914: Flag-driven scan limits look correct; guard against zero-valued flagsThe new guards on
scan_cnt,export_data_size, and wall-clock delta viaFLAGS_data_sync_scan_batch_size,FLAGS_data_sync_scan_data_size, andFLAGS_ckpt_scan_yield_time_usare consistent with the existing control flow and should make ckpt/data-sync behavior tunable per PR intent.One edge case: if
FLAGS_data_sync_scan_batch_size == 0orFLAGS_data_sync_scan_data_size == 0, this loop will never execute and the request can reschedule indefinitely without making progress. Similarly, aFLAGS_ckpt_scan_yield_time_usof 0 will force almost immediate yields. It may be worth clamping these flags to sane minimums at definition time (or asserting here) to avoid misconfiguration leading to stalled or overly-throttled scans.tx_service/include/cc/cc_request.h (1)
3747-3753: Assert now tied toFLAGS_data_sync_scan_batch_size; confirm invariant still holdsThe ctor now asserts
scan_batch_size_ > FLAGS_data_sync_scan_batch_size(previously compared against the class constant). This couples the constructor argument to a runtime flag: if someone configuresdata_sync_scan_batch_sizeto be ≥ the passedscan_batch_size, debug builds will trip the assert.Please verify:
- All call sites pass
scan_batch_sizevalues that are guaranteed to remain strictly larger thanFLAGS_data_sync_scan_batch_sizefor any supported configuration.- If you expect
data_sync_scan_batch_sizeto be changed at runtime, consider either validating this relationship where the flag is set or relaxing the condition (e.g.,>=) plus logging instead of a hard assert.tx_service/src/cc/local_cc_shards.cpp (1)
68-72: ckpt_buffer_ratio-based sizing looks good, but guard against negative effective ratiosUsing
FLAGS_ckpt_buffer_ratioto size bothcur_flush_buffer_anddata_sync_mem_controller_matches the PR intent and preserves the old 5% / 7.5% relationship when the default ratio is 0.075. However, if an operator ever setsckpt_buffer_ratio <= 0.025, the expressionMB(conf.at("node_memory_limit_mb")) * (FLAGS_ckpt_buffer_ratio - 0.025)becomes negative before the
static_cast<uint64_t>, which will wrap to a very large positive value and effectively remove your memory cap.Consider clamping or validating the flag so the effective ratios are non‑negative (and within a sane upper bound, e.g. ≤1). A simple option is to validate the flag at definition time (in the
DEFINE_doublesite) or clamp locally before computing the sizes.Also applies to: 132-136
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
core/src/data_substrate.cpptx_service/include/cc/cc_request.htx_service/include/cc/template_cc_map.htx_service/src/cc/cc_shard.cpptx_service/src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/src/cc/cc_shard.cpptx_service/src/cc/local_cc_shards.cpptx_service/include/cc/cc_request.h
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Applied to files:
tx_service/include/cc/template_cc_map.htx_service/src/cc/local_cc_shards.cpptx_service/include/cc/cc_request.h
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.
Applied to files:
tx_service/include/cc/template_cc_map.htx_service/include/cc/cc_request.h
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
🧬 Code graph analysis (1)
tx_service/include/cc/template_cc_map.h (1)
tx_service/include/util.h (1)
ReadTimeMicroseconds(580-619)
🔇 Additional comments (2)
tx_service/src/cc/cc_shard.cpp (1)
58-58: LGTM - Flag declaration is correct.The declaration of
ckpt_buffer_ratiofollows the established pattern and is necessary to consume the flag defined indata_substrate.cpp.tx_service/include/cc/cc_request.h (1)
88-89: Verify gflags declaration inclusion for newDECLARE_uint64flagsThe new
DECLARE_uint64(data_sync_scan_batch_size)/DECLARE_uint64(data_sync_scan_data_size)rely on the gflags declaration macros being visible. Please double‑check that this header (or a consistently included ancestor such astx_service_common.hor similar) includes the appropriate gflags declaration header (e.g.,gflags/gflags_declare.horgflags/gflags.h), otherwise TU’s that includecc_request.hin isolation will fail to compile.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tx_service/include/cc/cc_request.h (1)
3722-3751: Compilation error: assert references non-existent gflagThe
HashPartitionDataSyncScanCcconstructor's assert usesFLAGS_data_sync_scan_batch_size, which does not exist in the codebase. The only declared flag for this code path isFLAGS_hash_partition_data_sync_scan_batch_size(declared at line 88).Replace the assert with:
assert(scan_batch_size_ > FLAGS_hash_partition_data_sync_scan_batch_size);tx_service/src/cc/local_cc_shards.cpp (1)
4741-4801: Fix division-by-zero and incorrect partition filtering inDataSyncForHashPartitionTwo critical correctness issues exist in the per-core hash-partition logic:
Division-by-zero crash on systems with >1024 CPU cores (ELOQSTORE builds)
Whencore_number > total_hash_partitions(1024), workers with high indices receive zero partitions:partition_number_this_core = 0The ELOQSTORE path at line 4785 then crashes:
flush_buffer_size / (default_data_file_size * partition_number_this_core) // divides by 0This is a valid scenario on modern high-core systems. The non-ELOQSTORE path guards this correctly, but ELOQSTORE does not.
Filter incorrectly includes partitions owned by other workers
Partitions for each worker are non-contiguous:{worker_idx + core_number*0, worker_idx + core_number*1, ...}. For example, withtotal_hash_partitions=10,core_number=3,worker_idx=0, this worker owns{0, 3, 6, 9}.The current filter uses a numeric range check at lines 4836-4837:
return static_cast<size_t>(part_id) >= min_partition_id_this_scan && static_cast<size_t>(part_id) <= max_partition_id_this_scanWith
partition_number_per_scan=2, the first batch range[0, 3]incorrectly admits partitions1and2, which are owned by other workers. This breaks partitioning invariants and can cause duplicate scans/flushes.Concrete fixes required:
- Guard ELOQSTORE division with
std::max(1, partition_number_this_core)to prevent crashes.- Replace numeric range filter with modular arithmetic: verify
(partition_id - worker_idx) % core_number == 0to ensure the partition belongs to this worker, then check if its local index is in the current batch range.
🤖 Fix all issues with AI agents
In @CMakeLists.txt:
- Line 9: The CMake default for WITH_DATA_STORE was changed to ELOQDSS_ELOQSTORE
which can silently alter builds; update CI/build scripts and deployment configs
to explicitly set -DWITH_DATA_STORE=ELOQDSS_ROCKSDB (or the desired variant)
where RocksDB is required, audit other CMake entry points (e.g.,
store_handler/eloq_data_store_service/CMakeLists.txt which defaults to
ELOQDSS_ROCKSDB_CLOUD_S3) and align their defaults or override them in calling
scripts, and add/extend migration documentation and team notifications
explaining the new default, recommended overrides, and any required
configuration changes.
In @core/src/data_substrate.cpp:
- Around line 76-91: The ckpt_buffer_ratio flag is ambiguous and can produce
negative internal values when used as (FLAGS_ckpt_buffer_ratio - 0.025) in
LocalCcShards; validate and clamp FLAGS_ckpt_buffer_ratio at startup to a safe
range (e.g., >0.05 and <=0.5) before any use, update the flag help string to
document the split (e.g., "total fraction; flush_buffer = ratio - 0.025,
controller = ratio"), and change all consumers (notably LocalCcShards where
(FLAGS_ckpt_buffer_ratio - 0.025) is used) to use the clamped/validated value so
the subtraction cannot underflow or be cast to a huge uint64_t.
In @tx_service/include/cc/cc_request.h:
- Around line 88-90: The assertion referencing the undefined flag should be
updated to use the correct flag name: replace the use of
FLAGS_data_sync_scan_batch_size with
FLAGS_hash_partition_data_sync_scan_batch_size in the assertion that checks
scan_batch_size_ (look for the assert involving scan_batch_size_ in
cc_request.h); ensure the condition reads against
FLAGS_hash_partition_data_sync_scan_batch_size so it matches the declared flags
DECLARE_uint64(hash_partition_data_sync_scan_batch_size) and
DECLARE_uint64(hash_partition_data_sync_scan_data_size).
In @tx_service/src/cc/local_cc_shards.cpp:
- Around line 132-137: The constructor init-list computes cur_flush_buffer_ and
data_sync_mem_controller_ using (FLAGS_ckpt_buffer_ratio - 0.025) which can be
negative and underflow when cast to uint64_t; clamp the ratio first (e.g.,
compute a local double clamped_ratio = std::max(0.0, FLAGS_ckpt_buffer_ratio);
or ensure clamped_ratio >= 0.025 depending on intended semantics) using
FLAGS_ckpt_buffer_ratio and use that single clamped_ratio when computing both
cur_flush_buffer_ and data_sync_mem_controller_ (still multiplying by
MB(conf.at("node_memory_limit_mb"))) so both consumers derive from the safe,
documented value and avoid negative-to-uint64_t underflow.
🧹 Nitpick comments (1)
tx_service/include/cc/template_cc_map.h (1)
5902-5907: Hash-partition scan loop now bounded by tunable batch size, data size, and yield timeThe new guards:
scan_cnt < FLAGS_hash_partition_data_sync_scan_batch_sizeexport_data_size < FLAGS_hash_partition_data_sync_scan_data_size- time-based break using
FLAGS_hash_partition_ckpt_scan_yield_time_uscorrectly cap per-iteration work and bytes while preserving existing
req.accumulated_scan_cnt_and iterator termination checks. Behavior looks sound.One edge case to consider: if
hash_partition_data_sync_scan_data_sizeis configured as 0, the loop will never process any entries. If that’s not desired, treating 0 (or very small values) as “no limit” or enforcing a minimum >0 at flag parsing time might avoid surprising misconfigurations.Also applies to: 5912-5915
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
CMakeLists.txtcore/src/data_substrate.cpptx_service/include/cc/cc_request.htx_service/include/cc/template_cc_map.htx_service/src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/src/cc/local_cc_shards.cpptx_service/include/cc/cc_request.h
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Applied to files:
tx_service/src/cc/local_cc_shards.cpptx_service/include/cc/cc_request.htx_service/include/cc/template_cc_map.h
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.
Applied to files:
tx_service/include/cc/cc_request.htx_service/include/cc/template_cc_map.h
🧬 Code graph analysis (1)
tx_service/include/cc/template_cc_map.h (1)
tx_service/include/util.h (1)
ReadTimeMicroseconds(580-619)
🔇 Additional comments (3)
CMakeLists.txt (1)
9-9: Note: Potential disconnect between PR scope and files under review.The PR title indicates "add parameter for tuning hash partition checkpoint, increase ckpt scan buffer," and the AI summary mentions replacing hard-coded constants with gflags (
hash_partition_*,ckpt_buffer_ratio). However, the only visible change in this file is the data store default modification.If this PR includes additional files with the checkpoint tuning changes, those files are not present in this review. If this CMakeLists.txt change is the complete scope, consider updating the PR title and description to accurately reflect that this is primarily a default data store configuration change.
tx_service/include/cc/template_cc_map.h (1)
73-74: Flag definition is properly wiredThe declaration of
hash_partition_ckpt_scan_yield_time_ushas a matching DEFINE incore/src/data_substrate.cppwith a default value of 100 microseconds. No duplicate definitions exist.tx_service/src/cc/local_cc_shards.cpp (1)
69-72: New hash-partition gflag declarations look consistentThe new
DECLARE_*entries forckpt_buffer_ratioand hash-partition scan/data-size flags align with the correspondingDEFINE_*s incore/src/data_substrate.cppand match the existing pattern forDECLARE_bool(report_ckpt). No issues from a declaration/visibility perspective.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @tx_service/src/cc/cc_shard.cpp:
- Around line 112-114: Update the comment above the memory calculation to
indicate the data-sync buffer percentage is configurable via
FLAGS_ckpt_buffer_ratio (e.g., "Configurable % for data sync memory via
FLAGS_ckpt_buffer_ratio") and add validation for FLAGS_ckpt_buffer_ratio before
computing memory_usage_ratio in cc_shard.cpp: check that FLAGS_ckpt_buffer_ratio
is within [0.0, 1.0] and handle out-of-range values (either clamp to the valid
range or log an error and use a safe default) so the calculation of
memory_usage_ratio = 1.0 - range_slice_reserve_ratio - FLAGS_ckpt_buffer_ratio
cannot produce invalid results.
- Line 61: Add a gflags validator for the ckpt_buffer_ratio flag so it cannot be
set below 0.025 or above 1.0; register the validator (via
gflags::RegisterFlagValidator) for FLAGS_ckpt_buffer_ratio where the flag is
defined/initialized (replace the current DECLARE_double(ckpt_buffer_ratio) usage
with or accompany it by a RegisterFlagValidator call) and return false if the
value is outside [0.025, 1.0], providing a clear error message; this prevents
negative results in calculations that use FLAGS_ckpt_buffer_ratio (see usages in
local_cc_shards.cpp around the subtraction FLAGS_ckpt_buffer_ratio - 0.025 and
in cc_shard.cpp where 1.0 - range_slice_reserve_ratio - FLAGS_ckpt_buffer_ratio
is computed).
🧹 Nitpick comments (2)
tx_service/include/cc/cc_request.h (1)
3745-3748: Confirm invariant betweenscan_batch_size_andFLAGS_hash_partition_data_sync_scan_batch_sizeThe ctor now asserts:
assert(scan_batch_size_ > FLAGS_hash_partition_data_sync_scan_batch_size);Previously this compared against a fixed constexpr; now the RHS is runtime-tunable. If callers still pass a fixed
scan_batch_size(e.g., a constant based on prior defaults), a large flag value can violate this invariant and (withNDEBUG) potentially let downstream code assume a larger working batch thandata_sync_vec_actually holds.Please double-check that:
- All call sites compute
scan_batch_sizefrom the same flag (or a safe function of it), and- There is no path that uses
FLAGS_hash_partition_data_sync_scan_batch_sizeas a loop bound independent ofdata_sync_vec_.size().If either is not guaranteed, consider clamping the effective batch size to
scan_batch_size_or relaxing this to a>=/runtime check with logging instead of a debug-onlyassert.tx_service/src/cc/cc_shard.cpp (1)
163-165: Consider aligning hard-coded metrics multiplier with dynamic configuration.The hard-coded
0.9multiplier formemory_limit_appears inconsistent with the newly configurableFLAGS_ckpt_buffer_ratio. The comment mentions "10% memory size is reserved for data sync scan," but this percentage is now dynamic. This discrepancy could mislead operators monitoring memory limits.Consider calculating the metrics value based on the actual configured ratio to provide accurate observability.
♻️ Suggested improvement
meter_->Register(metrics::NAME_MEMORY_LIMIT, metrics::Type::Gauge); - // 10% memory size is reserved for data sync scan - meter_->Collect(metrics::NAME_MEMORY_LIMIT, memory_limit_ * 0.9); + // Report effective memory limit accounting for data sync reservation + meter_->Collect(metrics::NAME_MEMORY_LIMIT, + memory_limit_ * (1.0 - FLAGS_ckpt_buffer_ratio / memory_usage_ratio));Note: This assumes you want to report the memory available after accounting for the checkpoint buffer ratio. Adjust the formula based on the intended semantics of the
MEMORY_LIMITmetric.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tx_service/include/cc/cc_request.htx_service/include/cc/template_cc_map.htx_service/src/cc/cc_shard.cpptx_service/src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/src/cc/cc_shard.cpptx_service/include/cc/cc_request.htx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.
Applied to files:
tx_service/src/cc/cc_shard.cpptx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/src/cc/cc_shard.cpptx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Applied to files:
tx_service/include/cc/cc_request.htx_service/src/cc/local_cc_shards.cpptx_service/include/cc/template_cc_map.h
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.
Applied to files:
tx_service/include/cc/cc_request.htx_service/include/cc/template_cc_map.h
🧬 Code graph analysis (1)
tx_service/include/cc/template_cc_map.h (1)
tx_service/include/util.h (1)
ReadTimeMicroseconds(580-619)
🔇 Additional comments (4)
tx_service/include/cc/cc_request.h (1)
86-90: Hash-partition data-sync FLAGS declarations look appropriateDeclaring
hash_partition_data_sync_scan_batch_sizeandhash_partition_data_sync_scan_data_sizeasuint64insidetxserviceis consistent with their intended use as tunable sizing thresholds; placement before any use in this header is fine. Just ensure the defining TU includes the appropriate gflags header and that defaults remain sane relative to existing batch sizes so tuning cannot accidentally blow up memory usage.tx_service/include/cc/template_cc_map.h (2)
5899-5914: Configurable pacing for hash-partition ckpt scan looks correctUsing
FLAGS_hash_partition_data_sync_scan_batch_size,FLAGS_hash_partition_data_sync_scan_data_size, and the newFLAGS_hash_partition_ckpt_scan_yield_time_usto bound the loop and periodically yield based onReadTimeMicroseconds()is consistent with existing time-based yield patterns in this file and should help bound per-iteration work without changing semantics.
73-76: No action needed. TheDECLARE_uint32(hash_partition_ckpt_scan_yield_time_us)in the header file andDEFINE_uint32(hash_partition_ckpt_scan_yield_time_us)intx_service/src/cc/local_cc_shards.cppare both properly placed withinnamespace txservice(see line 66 of local_cc_shards.cpp). This is consistent with related flags (hash_partition_data_sync_scan_batch_size,hash_partition_data_sync_scan_data_size) which are also defined in the same namespace. No link-time errors will occur.Likely an incorrect or invalid review comment.
tx_service/src/cc/local_cc_shards.cpp (1)
68-84: Each new gflag is defined exactly once—no ODR violations foundThe search confirms that
ckpt_buffer_ratio,hash_partition_ckpt_scan_yield_time_us,hash_partition_data_sync_scan_batch_size, andhash_partition_data_sync_scan_data_sizeare each defined exactly once in the codebase at their intended locations (lines 69–84 inlocal_cc_shards.cpp). No duplicateDEFINE_*statements exist, so there are no gflags conflicts or ODR issues to resolve.
Here are some reminders before you submit the pull request
fixes eloqdb/tx_service#issue_id./mtr --suite=mono_main,mono_multi,mono_basicSummary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.