Conversation
|
Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
WalkthroughAdds vector capacity shrinking and explicit partition-map clearing, increases a range load-factor constant, broadens non-write orphan-lock detection, and includes per-core in-memory vector sizes in data-sync flush memory accounting for both range and hash partitions. Changes
Sequence Diagram(s)(Skipped — changes are local bookkeeping and conditional logic adjustments that do not introduce a new multi-component sequential flow.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tx_service/src/cc/cc_shard.cpp (1)
1011-1020: Update the log message to reflect the broadened condition.The condition now triggers for any lock type that is not
WriteLock(line 1011), but the log message on line 1018 specifically says "read lock detected". This is misleading if other non-write lock types exist and trigger this path.🔎 Suggested fix
- DLOG(INFO) << "read lock detected in tx " - << lock_holding_txn << ", skip recovery"; + DLOG(INFO) << "non-write lock detected in tx " + << lock_holding_txn << " (lock type: " + << static_cast<int>(lk_type) << "), skip recovery";Additionally, consider adding a comment explaining why the broadened check is appropriate:
// Check if this is a non-write lock. According to the logic above (lines 1013-1017), // only write locks should appear as orphan locks on local nodes. Any other lock type // indicates this is not an orphan lock scenario, so skip recovery. if (lk_type != LockType::WriteLock)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
store_handler/data_store_service_client.cppstore_handler/data_store_service_client_closure.htx_service/include/cc/range_slice.htx_service/src/cc/cc_shard.cpptx_service/src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/src/cc/local_cc_shards.cpptx_service/src/cc/cc_shard.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.
Applied to files:
tx_service/src/cc/local_cc_shards.cpptx_service/src/cc/cc_shard.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
🔇 Additional comments (4)
tx_service/include/cc/range_slice.h (1)
590-606: Behavioral change: larger target subrange size due to higher load factor
new_range_load_factorgoing from0.1to0.6directly increases the target subrange size inCalculateRangeSplitKeys(fewer, larger subranges perpost_ckpt_size). The math and usage are consistent; this is a pure tuning change. Please just double‑check this higher load factor matches the intended balance between range size, memory usage, and split frequency in your deployment.store_handler/data_store_service_client.cpp (2)
361-387: Good: early cleanup ofhash_partitions_mapto reduce peak memoryClearing
hash_partitions_mapright after preparing allPartitionFlushStatebatches is safe (no further uses) and helps release potentially large index vectors while partition RPCs are in flight.
387-409: Good: symmetric cleanup ofrange_partitions_mapSimilarly, clearing
range_partitions_maponce all range partition batches are built is correct and reduces transient memory usage for large flushes, without affecting control flow or error handling.tx_service/src/cc/local_cc_shards.cpp (1)
5001-5004: Hash-partition mem accounting correctly adds FlushRecord/container overheadThe added terms based on
scan_cc.DataSyncVec(),scan_cc.ArchiveVec(), andscan_cc.MoveBaseIdxVec()expandflush_data_sizeto include the in-memory FlushRecord and mv-base container cost, which aligns with the quota controller’s intent and mirrors the range-partition path (once fixed).No further changes needed here.
| void Clear() | ||
| { | ||
| key_parts.clear(); | ||
| key_parts.shrink_to_fit(); | ||
| record_parts.clear(); | ||
| record_parts.shrink_to_fit(); | ||
| records_ts.clear(); | ||
| records_ts.shrink_to_fit(); | ||
| records_ttl.clear(); | ||
| records_ttl.shrink_to_fit(); | ||
| record_tmp_mem_area.clear(); | ||
| record_tmp_mem_area.shrink_to_fit(); | ||
| op_types.clear(); | ||
| op_types.shrink_to_fit(); | ||
| parts_cnt_per_key = 1; | ||
| parts_cnt_per_record = 1; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
git ls-files | grep -E "data_store_service_client_closure\.(h|cpp)" | head -5Repository: eloqdata/tx_service
Length of output: 164
🏁 Script executed:
cat -n store_handler/data_store_service_client_closure.h | sed -n '340,380p'Repository: eloqdata/tx_service
Length of output: 1534
🏁 Script executed:
# Search for PartitionBatchRequest usage and reuse patterns
rg -n "PartitionBatchRequest" --type=cpp -A 3 -B 3 | head -100Repository: eloqdata/tx_service
Length of output: 7863
🏁 Script executed:
# Look for Clear() and Reset() calls in context
rg -n "\.Clear\(\)|\.Reset\(" --type=cpp store_handler/data_store_service_client_closure.h -B 2 -A 2Repository: eloqdata/tx_service
Length of output: 9265
Consider the reallocation cost vs. memory benefit trade-off in the hot path.
The shrink_to_fit() calls after clear() will release excess capacity, reducing memory footprint. However, since Reset() (lines 361-374) immediately reserves new capacity before reuse, this will cause unnecessary deallocation/reallocation cycles if PartitionBatchRequest objects are frequently reused—as they are in batch processing pipelines (see inflight_batch reuse patterns and GetNextBatch() method).
Consider profiling to verify this optimization doesn't introduce performance regression in high-throughput scenarios. Alternatively, consider the pattern used elsewhere (e.g., line 1632-1636) which swaps with empty objects instead of shrinking and re-reserving.
🤖 Prompt for AI Agents
In store_handler/data_store_service_client_closure.h around lines 343 to 359,
the Clear() method calls shrink_to_fit() after clear() on multiple vectors which
forces deallocation and will cause unnecessary reallocation when these objects
are immediately Reset() with reserved capacity; remove the shrink_to_fit() calls
(or replace each shrink_to_fit() pattern with swap-with-empty-object idiom used
elsewhere) so Clear() only clears the vectors without freeing capacity, matching
the Reset() reservation behavior and avoiding hot-path reallocations in frequent
reuse scenarios.
|
The calculation of memory usage depends on sizeof, which is incorrect. Better call |
TODO: verify size of PartitionBatchRequest. Maybe we should only free memory if length is more than a threshold
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tx_service/src/cc/local_cc_shards.cpp (1)
4097-4105: Memory quota should consider vector capacity, not justsize() * sizeof(T)Using
scan_cc.DataSyncVec*().size() * sizeof(FlushRecord)(and same for archive/mv-base) underestimates actual heap usage, sincestd::vectortypically holds extra reserved capacity. For the quota controller this can materially under-count memory, especially under growth, even though switching to thescan_ccvectors here is otherwise correct.Consider basing the additional charge on
capacity()(or allocator‑reported usable size) instead ofsize()for both range and hash paths:Suggested change (range & hash paths)
- // The cost of FlushRecord also needs to be considered. - for (size_t i = 0; i < cc_shards_.size(); ++i) - { - flush_data_size += - (scan_cc.DataSyncVec(i).size() * sizeof(FlushRecord) + - scan_cc.ArchiveVec(i).size() * sizeof(FlushRecord) + - scan_cc.MoveBaseIdxVec(i).size() * - sizeof(std::pair<TxKey, int32_t>)); - } + // Also account for the in‑memory containers that will hold this batch. + for (size_t i = 0; i < cc_shards_.size(); ++i) + { + const auto &data_vec = scan_cc.DataSyncVec(i); + const auto &archive_vec = scan_cc.ArchiveVec(i); + const auto &mv_base_vec = scan_cc.MoveBaseIdxVec(i); + flush_data_size += + data_vec.capacity() * sizeof(FlushRecord) + + archive_vec.capacity() * sizeof(FlushRecord) + + mv_base_vec.capacity() * + sizeof(std::pair<TxKey, int32_t>); + }- // The cost of FlushRecord also needs to be considered. - flush_data_size += - (scan_cc.DataSyncVec().size() * sizeof(FlushRecord) + - scan_cc.ArchiveVec().size() * sizeof(FlushRecord) + - scan_cc.MoveBaseIdxVec().size() * - sizeof(std::pair<TxKey, int32_t>)); + // Also account for the in‑memory containers that will hold this batch. + const auto &data_vec = scan_cc.DataSyncVec(); + const auto &archive_vec = scan_cc.ArchiveVec(); + const auto &mv_base_vec = scan_cc.MoveBaseIdxVec(); + flush_data_size += + data_vec.capacity() * sizeof(FlushRecord) + + archive_vec.capacity() * sizeof(FlushRecord) + + mv_base_vec.capacity() * + sizeof(std::pair<TxKey, int32_t>);If you want even tighter accounting, you could go further and use allocator‑specific
*_usable_size(vector.data())instead ofcapacity()*sizeof(T), but the above keeps things portable while avoiding the worst underestimation.Also applies to: 5002-5007
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tx_service/src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.
Applied to files:
tx_service/src/cc/local_cc_shards.cpp
| size_t flush_record_size = mi_malloc_usable_size(&flush_record); | ||
| #ifdef WITH_JEMALLOC | ||
| flush_record_size = sizeof(FlushRecord); | ||
| #endif |
There was a problem hiding this comment.
Please take a look at the figure I pasted in eloqdata/project_tracker#73.
std::vector data_sync_vecs, archive_vecs and mv_base_vecs might allocs memory more than calculated value.
I suggest using mi_malloc_usable_size to get the memory usage of those vectors, not the variable flush_record.
address comment in previous pr #323
Here are some reminders before you submit the pull request
fixes eloqdb/tx_service#issue_id./mtr --suite=mono_main,mono_multi,mono_basicSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.