Skip to content

Index mem#323

Merged
liunyl merged 4 commits into
mainfrom
index_mem
Dec 26, 2025
Merged

Index mem#323
liunyl merged 4 commits into
mainfrom
index_mem

Conversation

@liunyl

@liunyl liunyl commented Dec 25, 2025

Copy link
Copy Markdown
Contributor

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

  • Chores
    • Improved memory cleanup in batch operations and reduced retained capacity after clears.
    • Tightened memory/quota calculations for data synchronization by including additional in-memory footprints.
    • Broadened lock-handling during transaction recovery to treat more non-writable locks uniformly.
    • Increased the default load-factor used for range partitioning decisions.

✏️ Tip: You can customize this high-level summary in your review settings.

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ liunyl
❌ Ubuntu


Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai

coderabbitai Bot commented Dec 25, 2025

Copy link
Copy Markdown

Walkthrough

Adds vector capacity shrinking and explicit partition-map clearing, increases a range load-factor constant, broadens non-write orphan-lock detection, and includes per-core in-memory vector sizes in data-sync flush memory accounting for both range and hash partitions.

Changes

Cohort / File(s) Summary
Memory cleanup in store handler
store_handler/data_store_service_client.cpp
DataStoreServiceClient::PutAll now calls clear() on hash_partitions_map and range_partitions_map after their processing loops to release retained memory.
Partition batch capacity shrink
store_handler/data_store_service_client_closure.h
PartitionBatchRequest::Clear() now calls shrink_to_fit() on key_parts, record_parts, records_ts, records_ttl, record_tmp_mem_area, and op_types after clearing to reduce retained capacity.
Range tuning constant
tx_service/include/cc/range_slice.h
StoreRange::new_range_load_factor changed from 0.1 to 0.6.
Orphan-lock detection
tx_service/src/cc/cc_shard.cpp
In CheckRecoverTx, condition broadened: any non-WriteLock now triggers the orphan-lock skip/recovery path (previously only ReadLock).
Data-sync memory accounting
tx_service/src/cc/local_cc_shards.cpp
DataSyncForRangePartition and DataSyncForHashPartition now include per-core sizes of data_sync_vec, archive_vec, and mv_base_vec in flush_data_size when evaluating/allocating flush memory quota.

Sequence Diagram(s)

(Skipped — changes are local bookkeeping and conditional logic adjustments that do not introduce a new multi-component sequential flow.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I hopped through code with gentle paws,
Cleared the maps beneath the gauze,
Shrunk the bins that bloomed too wide,
Load-factor rose, locks learned to bide,
Counting bytes so flushes glide.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete and only contains the unchecked contributor checklist template without any substantive explanation of the changes, objectives, or technical details. Add a detailed description explaining the changes made, including the rationale for memory optimization changes, the load factor update, and how the lock type handling was modified. Reference the related issue and RFC if applicable.
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Index mem' is vague and does not clearly convey the specific changes made; it lacks descriptive detail about memory optimization or the affected components. Consider using a more descriptive title that reflects the main changes, such as 'Optimize memory allocation and improve partition load factor handling' or 'Refactor memory management for hash and range partitions.'
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch index_mem

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6edf9f7 and b6b35d7.

📒 Files selected for processing (1)
  • tx_service/src/cc/local_cc_shards.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • tx_service/src/cc/local_cc_shards.cpp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tx_service/src/cc/cc_shard.cpp (1)

1011-1020: Update the log message to reflect the broadened condition.

The condition now triggers for any lock type that is not WriteLock (line 1011), but the log message on line 1018 specifically says "read lock detected". This is misleading if other non-write lock types exist and trigger this path.

🔎 Suggested fix
-                    DLOG(INFO) << "read lock detected in tx "
-                               << lock_holding_txn << ", skip recovery";
+                    DLOG(INFO) << "non-write lock detected in tx "
+                               << lock_holding_txn << " (lock type: " 
+                               << static_cast<int>(lk_type) << "), skip recovery";

Additionally, consider adding a comment explaining why the broadened check is appropriate:

// Check if this is a non-write lock. According to the logic above (lines 1013-1017),
// only write locks should appear as orphan locks on local nodes. Any other lock type
// indicates this is not an orphan lock scenario, so skip recovery.
if (lk_type != LockType::WriteLock)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4943d80 and dd1e7fd.

📒 Files selected for processing (5)
  • store_handler/data_store_service_client.cpp
  • store_handler/data_store_service_client_closure.h
  • tx_service/include/cc/range_slice.h
  • tx_service/src/cc/cc_shard.cpp
  • tx_service/src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
  • tx_service/src/cc/cc_shard.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
  • tx_service/src/cc/cc_shard.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
🔇 Additional comments (4)
tx_service/include/cc/range_slice.h (1)

590-606: Behavioral change: larger target subrange size due to higher load factor

new_range_load_factor going from 0.1 to 0.6 directly increases the target subrange size in CalculateRangeSplitKeys (fewer, larger subranges per post_ckpt_size). The math and usage are consistent; this is a pure tuning change. Please just double‑check this higher load factor matches the intended balance between range size, memory usage, and split frequency in your deployment.

store_handler/data_store_service_client.cpp (2)

361-387: Good: early cleanup of hash_partitions_map to reduce peak memory

Clearing hash_partitions_map right after preparing all PartitionFlushState batches is safe (no further uses) and helps release potentially large index vectors while partition RPCs are in flight.


387-409: Good: symmetric cleanup of range_partitions_map

Similarly, clearing range_partitions_map once all range partition batches are built is correct and reduces transient memory usage for large flushes, without affecting control flow or error handling.

tx_service/src/cc/local_cc_shards.cpp (1)

5001-5004: Hash-partition mem accounting correctly adds FlushRecord/container overhead

The added terms based on scan_cc.DataSyncVec(), scan_cc.ArchiveVec(), and scan_cc.MoveBaseIdxVec() expand flush_data_size to include the in-memory FlushRecord and mv-base container cost, which aligns with the quota controller’s intent and mirrors the range-partition path (once fixed).

No further changes needed here.

Comment on lines 343 to 359
void Clear()
{
key_parts.clear();
key_parts.shrink_to_fit();
record_parts.clear();
record_parts.shrink_to_fit();
records_ts.clear();
records_ts.shrink_to_fit();
records_ttl.clear();
records_ttl.shrink_to_fit();
record_tmp_mem_area.clear();
record_tmp_mem_area.shrink_to_fit();
op_types.clear();
op_types.shrink_to_fit();
parts_cnt_per_key = 1;
parts_cnt_per_record = 1;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

git ls-files | grep -E "data_store_service_client_closure\.(h|cpp)" | head -5

Repository: eloqdata/tx_service

Length of output: 164


🏁 Script executed:

cat -n store_handler/data_store_service_client_closure.h | sed -n '340,380p'

Repository: eloqdata/tx_service

Length of output: 1534


🏁 Script executed:

# Search for PartitionBatchRequest usage and reuse patterns
rg -n "PartitionBatchRequest" --type=cpp -A 3 -B 3 | head -100

Repository: eloqdata/tx_service

Length of output: 7863


🏁 Script executed:

# Look for Clear() and Reset() calls in context
rg -n "\.Clear\(\)|\.Reset\(" --type=cpp store_handler/data_store_service_client_closure.h -B 2 -A 2

Repository: eloqdata/tx_service

Length of output: 9265


Consider the reallocation cost vs. memory benefit trade-off in the hot path.

The shrink_to_fit() calls after clear() will release excess capacity, reducing memory footprint. However, since Reset() (lines 361-374) immediately reserves new capacity before reuse, this will cause unnecessary deallocation/reallocation cycles if PartitionBatchRequest objects are frequently reused—as they are in batch processing pipelines (see inflight_batch reuse patterns and GetNextBatch() method).

Consider profiling to verify this optimization doesn't introduce performance regression in high-throughput scenarios. Alternatively, consider the pattern used elsewhere (e.g., line 1632-1636) which swaps with empty objects instead of shrinking and re-reserving.

🤖 Prompt for AI Agents
In store_handler/data_store_service_client_closure.h around lines 343 to 359,
the Clear() method calls shrink_to_fit() after clear() on multiple vectors which
forces deallocation and will cause unnecessary reallocation when these objects
are immediately Reset() with reserved capacity; remove the shrink_to_fit() calls
(or replace each shrink_to_fit() pattern with swap-with-empty-object idiom used
elsewhere) so Clear() only clears the vectors without freeing capacity, matching
the Reset() reservation behavior and avoiding hot-path reallocations in frequent
reuse scenarios.

Comment thread tx_service/src/cc/local_cc_shards.cpp
@xiexiaoy

Copy link
Copy Markdown
Collaborator

The calculation of memory usage depends on sizeof, which is incorrect. Better call size_t malloc_usable_size(void *ptr);.

TODO: verify size of  PartitionBatchRequest. Maybe we should only
free memory if length is more than a threshold

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tx_service/src/cc/local_cc_shards.cpp (1)

4097-4105: Memory quota should consider vector capacity, not just size() * sizeof(T)

Using scan_cc.DataSyncVec*().size() * sizeof(FlushRecord) (and same for archive/mv-base) underestimates actual heap usage, since std::vector typically holds extra reserved capacity. For the quota controller this can materially under-count memory, especially under growth, even though switching to the scan_cc vectors here is otherwise correct.

Consider basing the additional charge on capacity() (or allocator‑reported usable size) instead of size() for both range and hash paths:

Suggested change (range & hash paths)
-            // The cost of FlushRecord also needs to be considered.
-            for (size_t i = 0; i < cc_shards_.size(); ++i)
-            {
-                flush_data_size +=
-                    (scan_cc.DataSyncVec(i).size() * sizeof(FlushRecord) +
-                     scan_cc.ArchiveVec(i).size() * sizeof(FlushRecord) +
-                     scan_cc.MoveBaseIdxVec(i).size() *
-                         sizeof(std::pair<TxKey, int32_t>));
-            }
+            // Also account for the in‑memory containers that will hold this batch.
+            for (size_t i = 0; i < cc_shards_.size(); ++i)
+            {
+                const auto &data_vec = scan_cc.DataSyncVec(i);
+                const auto &archive_vec = scan_cc.ArchiveVec(i);
+                const auto &mv_base_vec = scan_cc.MoveBaseIdxVec(i);
+                flush_data_size +=
+                    data_vec.capacity() * sizeof(FlushRecord) +
+                    archive_vec.capacity() * sizeof(FlushRecord) +
+                    mv_base_vec.capacity() *
+                        sizeof(std::pair<TxKey, int32_t>);
+            }
-        // The cost of FlushRecord also needs to be considered.
-        flush_data_size +=
-            (scan_cc.DataSyncVec().size() * sizeof(FlushRecord) +
-             scan_cc.ArchiveVec().size() * sizeof(FlushRecord) +
-             scan_cc.MoveBaseIdxVec().size() *
-                 sizeof(std::pair<TxKey, int32_t>));
+        // Also account for the in‑memory containers that will hold this batch.
+        const auto &data_vec = scan_cc.DataSyncVec();
+        const auto &archive_vec = scan_cc.ArchiveVec();
+        const auto &mv_base_vec = scan_cc.MoveBaseIdxVec();
+        flush_data_size +=
+            data_vec.capacity() * sizeof(FlushRecord) +
+            archive_vec.capacity() * sizeof(FlushRecord) +
+            mv_base_vec.capacity() *
+                sizeof(std::pair<TxKey, int32_t>);

If you want even tighter accounting, you could go further and use allocator‑specific *_usable_size(vector.data()) instead of capacity()*sizeof(T), but the above keeps things portable while avoiding the worst underestimation.

Also applies to: 5002-5007

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd1e7fd and 6edf9f7.

📒 Files selected for processing (1)
  • tx_service/src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Applied to files:

  • tx_service/src/cc/local_cc_shards.cpp

size_t flush_record_size = mi_malloc_usable_size(&flush_record);
#ifdef WITH_JEMALLOC
flush_record_size = sizeof(FlushRecord);
#endif

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the figure I pasted in eloqdata/project_tracker#73.

std::vector data_sync_vecs, archive_vecs and mv_base_vecs might allocs memory more than calculated value.

I suggest using mi_malloc_usable_size to get the memory usage of those vectors, not the variable flush_record.

@liunyl liunyl merged commit c5f9174 into main Dec 26, 2025
2 of 4 checks passed
@liunyl liunyl deleted the index_mem branch December 26, 2025 07:23
@liunyl liunyl restored the index_mem branch December 26, 2025 07:23
liunyl added a commit that referenced this pull request Dec 26, 2025
address comment in previous pr #323
This was referenced Jan 18, 2026
@coderabbitai coderabbitai Bot mentioned this pull request Feb 2, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants