Skip to content

Make hash partition checkpoint workers use a single heap#185

Merged
thweetkomputer merged 1 commit into
mainfrom
ckpt_shared_heap_zc
Nov 5, 2025
Merged

Make hash partition checkpoint workers use a single heap#185
thweetkomputer merged 1 commit into
mainfrom
ckpt_shared_heap_zc

Conversation

@thweetkomputer

@thweetkomputer thweetkomputer commented Nov 3, 2025

Copy link
Copy Markdown
Collaborator

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

Release Notes

  • New Features

    • Added heap usage monitoring and reporting for checkpoint operations. A new configuration flag enables detailed memory metrics during checkpoint processing to track resource utilization.
  • Chores

    • Minor code formatting improvements.

@coderabbitai

coderabbitai Bot commented Nov 3, 2025

Copy link
Copy Markdown

Walkthrough

The changes add checkpoint heap management functionality to LocalCcShards with initialization during construction, a reporting method, synchronization primitives, and conditional heap usage reporting during data synchronization operations controlled by a runtime flag.

Changes

Cohort / File(s) Summary
LocalCcShards heap infrastructure
include/cc/local_cc_shards.h
Adds InitializeHashPartitionCkptHeap() and ReportHashPartitionCkptHeapUsage() methods; introduces three new data members: hash_partition_ckpt_heap_ (heap pointer), hash_partition_main_thread_id_ (thread ID), and hash_partition_ckpt_heap_mux_ (shared mutex)
LocalCcShards implementation
src/cc/local_cc_shards.cpp
Declares global report_ckpt flag; implements ReportHashPartitionCkptHeapUsage() method; calls initialization in constructor; conditionally invokes reporting in data synchronization paths with thread/heap context management
Formatting adjustment
include/store/data_store_handler.h
Adds space before braces in ScheduleTimerTasks() method definition

Sequence Diagram

sequenceDiagram
    participant Init as LocalCcShards<br/>Constructor
    participant Heap as Heap<br/>Management
    participant Sync as DataSyncForHashPartition
    participant Report as ReportHashPartitionCkptHeapUsage
    
    Init->>Heap: InitializeHashPartitionCkptHeap()
    Heap->>Heap: Create hash_partition_ckpt_heap
    Heap->>Heap: Capture thread_id
    
    rect rgb(240, 248, 255)
    Note over Sync,Report: During data synchronization
    Sync->>Sync: Perform sync operations
    alt FLAGS_report_ckpt == true
        Sync->>Report: ReportHashPartitionCkptHeapUsage()
        Report->>Report: Set thread/heap context
        Report->>Report: Emit heap usage metrics
        Report->>Report: Restore context
    end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Thread safety validation: Verify hash_partition_ckpt_heap_mux_ is correctly acquired/released in all access paths
  • Heap lifecycle management: Confirm proper initialization in constructor and cleanup (if applicable) for hash_partition_ckpt_heap_
  • Context switching: Validate thread/heap context switching logic in ReportHashPartitionCkptHeapUsage() maintains correctness
  • Flag integration: Ensure FLAGS_report_ckpt conditional logic doesn't introduce unintended performance or behavioral changes

Poem

🐰 A heap springs forth, neat and clean,
With checkpoint tales to be seen,
A thread keeps guard, a mutex stands tall,
Reporting usage through it all! ✨📊

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description contains only the template checklist with no substantive content; all required items remain unchecked. Fill in the description with actual PR context: explain the changes, check completed items, add issue/RFC references if applicable, and document the testing status.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately summarizes the main objective: introducing hash partition checkpoint workers to use a single heap.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ckpt_shared_heap_zc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@thweetkomputer thweetkomputer changed the title hash partition checkpoint uses the single heap Make hash partition checkpoint workers use a single heap Nov 4, 2025
@thweetkomputer thweetkomputer force-pushed the ckpt_shared_heap_zc branch 2 times, most recently from f6e0e01 to 1144b58 Compare November 4, 2025 06:46
@thweetkomputer thweetkomputer marked this pull request as ready for review November 5, 2025 04:04
@liunyl

liunyl commented Nov 5, 2025

Copy link
Copy Markdown
Contributor

@CodeRabbit review

@coderabbitai

coderabbitai Bot commented Nov 5, 2025

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@thweetkomputer thweetkomputer merged commit 876d8b9 into main Nov 5, 2025
4 checks passed
@thweetkomputer thweetkomputer deleted the ckpt_shared_heap_zc branch November 5, 2025 08:24
@coderabbitai

coderabbitai Bot commented Nov 5, 2025

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cc/local_cc_shards.cpp (1)

4660-4673: Restore order and override handling are wrong; can corrupt default heaps.

Inside DataSyncForHashPartition you:

  • Restore thread id before restoring the default heap (should be reversed).
  • Don’t guard for “no prior override” (need mi_is_override_thread/mi_restore_default_thread_id pattern used elsewhere in this file).
  • Optional: reserve archive_vec/mv_base_vec under the shared heap too.

Apply:

-std::unique_lock<std::shared_mutex> heap_lk(
-    hash_partition_ckpt_heap_mux_);
-mi_threadid_t prev_thd =
-    mi_override_thread(hash_partition_main_thread_id_);
-mi_heap_t *prev_heap =
-    mi_heap_set_default(hash_partition_ckpt_heap_);
+std::unique_lock<std::shared_mutex> heap_lk(hash_partition_ckpt_heap_mux_);
+bool is_override_thd = mi_is_override_thread();
+mi_threadid_t prev_thd = mi_override_thread(hash_partition_main_thread_id_);
+mi_heap_t* prev_heap = mi_heap_set_default(hash_partition_ckpt_heap_);
 
 data_sync_vec->reserve(scan_cc.accumulated_scan_cnt_);
+archive_vec->reserve(scan_cc.ArchiveVec().size());
+mv_base_vec->reserve(scan_cc.MoveBaseIdxVec().size());
@@
-mi_override_thread(prev_thd);
-mi_heap_set_default(prev_heap);
-heap_lk.unlock();
+mi_heap_set_default(prev_heap);
+if (is_override_thd) {
+  mi_override_thread(prev_thd);
+} else {
+  mi_restore_default_thread_id();
+}
+heap_lk.unlock();

Also applies to: 4721-4723

🧹 Nitpick comments (1)
src/cc/local_cc_shards.cpp (1)

2896-2900: Reporting placement is fine; consider throttling.

If FLAGS_report_ckpt is used for periodic diagnostics, add simple rate limiting (e.g., once per second) to avoid log floods under high scan concurrency.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5cd5be3 and 1a38d6c.

📒 Files selected for processing (3)
  • include/cc/local_cc_shards.h (3 hunks)
  • include/store/data_store_handler.h (1 hunks)
  • src/cc/local_cc_shards.cpp (6 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.

Applied to files:

  • include/cc/local_cc_shards.h
  • src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Applied to files:

  • src/cc/local_cc_shards.cpp
🔇 Additional comments (1)
include/cc/local_cc_shards.h (1)

2427-2429: LGTM!

The member variable declarations follow the established pattern for table_ranges_heap_ and use appropriate types for heap management and synchronization.

Comment on lines +419 to +423
void InitializeHashPartitionCkptHeap()
{
hash_partition_ckpt_heap_ = mi_heap_new();
hash_partition_main_thread_id_ = mi_thread_id();
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add synchronization to prevent race conditions.

Unlike InitializeTableRangesHeap() (lines 409-417), this method doesn't protect the initialization with hash_partition_ckpt_heap_mux_. Concurrent calls could lead to heap leaks or corrupted state.

Apply this diff to add proper synchronization:

 void InitializeHashPartitionCkptHeap()
 {
+    std::unique_lock<std::shared_mutex> lk(hash_partition_ckpt_heap_mux_);
+    if (!hash_partition_ckpt_heap_)
+    {
         hash_partition_ckpt_heap_ = mi_heap_new();
         hash_partition_main_thread_id_ = mi_thread_id();
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void InitializeHashPartitionCkptHeap()
{
hash_partition_ckpt_heap_ = mi_heap_new();
hash_partition_main_thread_id_ = mi_thread_id();
}
void InitializeHashPartitionCkptHeap()
{
std::unique_lock<std::shared_mutex> lk(hash_partition_ckpt_heap_mux_);
if (!hash_partition_ckpt_heap_)
{
hash_partition_ckpt_heap_ = mi_heap_new();
hash_partition_main_thread_id_ = mi_thread_id();
}
}
🤖 Prompt for AI Agents
In include/cc/local_cc_shards.h around lines 419 to 423,
InitializeHashPartitionCkptHeap() lacks synchronization and can race; wrap the
initialization with hash_partition_ckpt_heap_mux_ (lock at start, unlock at
end), check if hash_partition_ckpt_heap_ is already set before creating a new
heap to avoid leaks, set hash_partition_main_thread_id_ only after successful
initialization, and release the mutex in all paths.

Comment on lines +66 to +67
DECLARE_bool(report_ckpt);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Incorrect gflags declaration scope (+ missing include).

DECLARE_bool(report_ckpt) is placed inside namespace txservice and this file doesn’t include gflags headers. This will likely create a mismatched symbol vs. the global DEFINE_bool and/or fail to compile.

Fix:

  • Include gflags header in this TU.
  • Declare the flag at global scope (or stop declaring here and just include the header).

Apply these changes:

Outside the changed hunk (file top, after other includes):

+#include <gflags/gflags.h>
+#include <iomanip>  // for std::setprecision (also used below)
+
+DECLARE_bool(report_ckpt);  // global scope

And remove the in-namespace declaration:

-DECLARE_bool(report_ckpt);
+// use global ::FLAGS_report_ckpt

Run to confirm the flag is defined somewhere:


🏁 Script executed:

#!/bin/bash
rg -nP 'DEFINE_bool\(\s*report_ckpt\b' -C2

Length of output: 429



---

🏁 Script executed:

```shell
#!/bin/bash
# Check context around line 66 in local_cc_shards.cpp to see namespace scope
sed -n '50,85p' src/cc/local_cc_shards.cpp | cat -n

Length of output: 1385


🏁 Script executed:

#!/bin/bash
# Check for gflags includes in local_cc_shards.cpp
rg -n '#include.*gflags' src/cc/local_cc_shards.cpp

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check for setprecision usage in local_cc_shards.cpp
rg -n 'setprecision' src/cc/local_cc_shards.cpp

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check for iomanip include in local_cc_shards.cpp
rg -n '#include.*iomanip' src/cc/local_cc_shards.cpp

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check how FLAGS_report_ckpt is used in local_cc_shards.cpp
rg -n 'FLAGS_report_ckpt|::report_ckpt' src/cc/local_cc_shards.cpp -C2

Length of output: 229


🏁 Script executed:

#!/bin/bash
# Check if there are other global-scope DECLARE_bool patterns in this file
rg -n 'DECLARE_bool|DECLARE_int' src/cc/local_cc_shards.cpp

Length of output: 29


🏁 Script executed:

#!/bin/bash
# Check all includes at top of local_cc_shards.cpp to see what's included
head -n 60 src/cc/local_cc_shards.cpp

Length of output: 1800


Fix incorrect gflags declaration scope and missing headers.

  • DECLARE_bool(report_ckpt) at line 66 is inside namespace txservice, but DEFINE_bool(report_ckpt) is defined globally in checkpointer.cpp. This namespace mismatch causes the declaration and definition to refer to different symbols. Move DECLARE_bool to global scope.
  • Add missing #include <gflags/gflags.h> (for DECLARE_bool).
  • Add missing #include (setprecision is used at line 6146).

Fix at file top after existing includes:

+#include <gflags/gflags.h>
+#include <iomanip>
+
+DECLARE_bool(report_ckpt);

And remove the in-namespace declaration at line 66:

-DECLARE_bool(report_ckpt);

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +169 to +170
InitializeHashPartitionCkptHeap();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Ctor: initialize shared hash-ckpt heap — verify definition and error handling.

Ensure InitializeHashPartitionCkptHeap() is defined in this PR (same TU or another) and handles failures idempotently (mi_heap_new, thread-id capture, etc.). If it can fail, log and fail-fast here.


🏁 Script executed:

#!/bin/bash
# Check presence/definition
rg -nP '\bInitializeHashPartitionCkptHeap\s*\(' -C2

Length of output: 537


🏁 Script executed:

#!/bin/bash
# Check how mi_heap_new() is used/handled elsewhere in the codebase
rg -nP '\bmi_heap_new\s*\(' -C3 -A1

Length of output: 381


🏁 Script executed:

#!/bin/bash
# Check if hash_partition_ckpt_heap_ is ever checked for null/validity
rg -nP '\bhash_partition_ckpt_heap_' -C2

Length of output: 2055


🏁 Script executed:

#!/bin/bash
# Look for error handling patterns in constructors
rg -nP 'LOG_ERROR|LOG_FATAL|DCHECK|CHECK' include/cc/local_cc_shards.h src/cc/local_cc_shards.cpp -A1 -B1 | head -40

Length of output: 225


🏁 Script executed:

#!/bin/bash
# Check what mi_heap_new returns and its definition
rg -nP 'mi_heap_new' --type-list | grep -E '\.(h|hpp)$' | head -5

Length of output: 389


🏁 Script executed:

#!/bin/bash
# Look for the mimalloc header or documentation
fd -e h -e hpp | xargs grep -l 'mi_heap_new' | head -5

Length of output: 28


🏁 Script executed:

#!/bin/bash
# Check constructor initialization order and patterns
ast-grep --pattern $'class $_ {
  $$$
  $_::$_() {
    $$$
    InitializeHashPartitionCkptHeap();
    $$$
  }
  $$$
}'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for how InitializeTableRangesHeap (previous call) is implemented
rg -nP '\bInitializeTableRangesHeap\s*\(' -A5 -B2

Length of output: 1004


🌐 Web query:

What is mi_heap_new() and can it fail? (mimalloc library)

💡 Result:

mi_heap_new() creates a new first-class mimalloc heap (returns mi_heap_t* which you can use with mi_heap_malloc / mi_heap_free and later mi_heap_delete or mi_heap_destroy). It can fail: on failure it returns NULL (e.g., out-of-memory or unable to allocate the heap resources). [1][2]

Sources:
[1] mimalloc API — Heap Allocation.
[2] mimalloc API — Extended functions (mi_heap_new_in_arena / return value note).


🌐 Web query:

mimalloc mi_heap_new return value error handling

💡 Result:

Short answer: mi_heap_new (and mi_heap_new_ex / mi_heap_new_in_arena) returns a mi_heap_t* on success and NULL on failure — you must check for NULL and handle it (fallback to default heap, abort, throw, etc.). Use mi_new / C++ wrappers if you want exceptions on OOM. [1][2]

Recommended patterns:

  • C:
    • Check if mi_heap_new() == NULL; if NULL, either continue using the global/default heap or handle out-of-memory (log + abort / return error). Example:
      mi_heap_t h = mi_heap_new();
      if (h == NULL) { /
      handle OOM: fallback/cleanup/abort */ }
  • C++:
    • Use mi_new / mi_new_aligned (they invoke std::get_new_handler and throw std::bad_alloc on failure) if you want exception semantics. [3]

References:

  • mi_heap_new_ex / mi_heap_new_in_arena docs (returns NULL on failure). [1]
  • Heap API overview (mi_heap_new / mi_heap_destroy). [2]
  • C++ wrappers (mi_new throws bad_alloc). [3]

[1] mi-malloc extended/heap docs — mi_heap_new_ex / in-arena (microsoft.github.io/mimalloc).
[2] mi-malloc heap allocation docs (microsoft.github.io/mimalloc).
[3] mi-malloc C++ wrappers (microsoft.github.io/mimalloc).


🌐 Web query:

mimalloc heap allocation failure

💡 Result:

Short checklist to diagnose and fix mimalloc "heap allocation" / "unable to allocate memory" failures:

  1. Confirm it’s actually OOM vs allocator config

    • Turn on verbose errors: MIMALLOC_VERBOSE=1 (or use mi_options) to see whether mimalloc reports "unable to allocate memory" or "unable to allocate aligned OS memory". [1][2]
  2. Common causes

    • Real out‑of‑memory or OS refusal to map pages (errno 12). Mimalloc will print the requested size and whether it’s an OS (segment/page) allocation failure. [3]
    • Mimalloc runtime limits / reserves (MIMALLOC_RESERVE_OS_MEMORY, MIMALLOC_LIMIT_OS_ALLOC, arena_reserve, etc.) can cause earlier/failures even when system memory exists. Check and adjust these env vars/options. [1][4]
    • Aligned allocation failures (mmap/MAP_FIXED or huge-page alignment) cause mimalloc to fall back or warn and sometimes fail for large allocations. Kernel/security/ASLR differences may trigger this. [2]
  3. Quick fixes to try

    • Re-run with MIMALLOC_VERBOSE=1 and MIMALLOC_SHOW_STATS=1 to see where allocation fails and current commit/RSS/peak. [1]
    • Remove/relax limits: unset MIMALLOC_LIMIT_OS_ALLOC or lower MIMALLOC_RESERVE_OS_MEMORY. [1][4]
    • Upgrade mimalloc to a recent release (known fixes for page-map / aligned allocation issues). [2]
    • If failure is for huge single allocations, try using smaller allocations or ensure allowed large pages / alignment are supported by the OS (or disable allow_large_os_pages). [1][2]
    • If your program is hitting system OOM, check cgroup/container memory limits and system free memory; add swap or increase limits.
  4. Debugging steps

    • Capture mimalloc verbose output and the exact error line (it prints requested bytes, errno, address). Share that when asking for help. [3]
    • Use mi_heap_collect(heap, true) before retry (mimalloc itself sometimes calls collect once before giving up). The code path shows it retries once then errors. [3]
    • If statically linked or using LD_PRELOAD/DYLD_INSERT_LIBRARIES, test with/without mimalloc to confirm allocator is the cause. [2][4]

If you paste the exact mimalloc error lines (verbose output) and relevant env vars (MIMALLOC_*), I can point to the precise cause and recommended change.

Sources: mimalloc docs (heap & runtime options), GitHub issues on aligned/OS allocation, and mimalloc source error path. [1][2][3][4]

References
[1] mimalloc documentation (modules & runtime options).
[2] microsoft/mimalloc GitHub issue: failing to allocate aligned OS memory.
[3] mimalloc source (page.c) showing retry + error message path.
[4] user reports / examples showing MIMALLOC_RESERVE_OS_MEMORY and MIMALLOC_LIMIT_OS_ALLOC usage.


🌐 Web query:

Does mi_heap_new return null on failure?

💡 Result:

Yes — mi_heap_new returns NULL on failure. [1][2]

References:

  • mimalloc official docs (mi_heap_new: "A new heap or NULL on failure"). [1]
  • mimalloc extended docs (mi_heap_new_ex: "Returns ... NULL on failure"). [2]

InitializeHashPartitionCkptHeap() lacks null-check and error handling for mi_heap_new().

mi_heap_new() can fail and return NULL (e.g., on out-of-memory). The current implementation silently assigns NULL without validation, risking crashes when the heap is used later (lines 4665, 6139). Add a null-check and fail-fast (e.g., LOG_FATAL or return false). See InitializeTableRangesHeap() (lines 412–414) for a consistent defensive pattern used elsewhere in the codebase.

🤖 Prompt for AI Agents
In src/cc/local_cc_shards.cpp around lines 169-170,
InitializeHashPartitionCkptHeap() currently assigns the result of mi_heap_new()
without checking for NULL; add a defensive null-check immediately after the
mi_heap_new() call and handle failure the same way as
InitializeTableRangesHeap(): if mi_heap_new() returns NULL log a fatal error (or
return false) with context and abort initialization so callers cannot use a NULL
heap. Ensure the log message includes the function name and reason
(out-of-memory) and that the function returns/propagates failure consistent with
surrounding code.

Comment on lines +6135 to +6153
void LocalCcShards::ReportHashPartitionCkptHeapUsage()
{
std::shared_lock<std::shared_mutex> heap_lk(hash_partition_ckpt_heap_mux_);
mi_threadid_t prev_thd = mi_override_thread(hash_partition_main_thread_id_);
mi_heap_t *prev_heap = mi_heap_set_default(hash_partition_ckpt_heap_);
int64_t allocated, committed;
mi_thread_stats(&allocated, &committed);
if (committed != 0)
{
LOG(INFO) << "ckpt hash partition heap memory usage report, committed "
<< committed << ", allocated " << allocated << ", frag ratio "
<< std::setprecision(2)
<< 100 * (static_cast<float>(committed - allocated) /
static_cast<float>(committed));
}
mi_override_thread(prev_thd);
mi_heap_set_default(prev_heap);
heap_lk.unlock();
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

ReportHashPartitionCkptHeapUsage(): shared lock + override is unsafe; missing headers/types.

  • Takes shared_lock but mutates thread-id/default-heap; must be exclusive.
  • Missing include for std::setprecision (compile error).
  • Use the same override/restore pattern as elsewhere.
  • Prefer size_t for mi_thread_stats outputs; verify API availability.

Apply:

-void LocalCcShards::ReportHashPartitionCkptHeapUsage()
-{
-    std::shared_lock<std::shared_mutex> heap_lk(hash_partition_ckpt_heap_mux_);
-    mi_threadid_t prev_thd = mi_override_thread(hash_partition_main_thread_id_);
-    mi_heap_t *prev_heap = mi_heap_set_default(hash_partition_ckpt_heap_);
-    int64_t allocated, committed;
-    mi_thread_stats(&allocated, &committed);
+void LocalCcShards::ReportHashPartitionCkptHeapUsage()
+{
+    std::unique_lock<std::shared_mutex> heap_lk(hash_partition_ckpt_heap_mux_);
+    bool is_override_thd = mi_is_override_thread();
+    mi_threadid_t prev_thd = mi_override_thread(hash_partition_main_thread_id_);
+    mi_heap_t* prev_heap = mi_heap_set_default(hash_partition_ckpt_heap_);
+    size_t allocated = 0, committed = 0;
+    mi_thread_stats(&allocated, &committed);
     if (committed != 0)
     {
-        LOG(INFO) << "ckpt hash partition heap memory usage report, committed "
-                  << committed << ", allocated " << allocated << ", frag ratio "
-                  << std::setprecision(2)
+        LOG(INFO) << "ckpt hash partition heap memory, committed="
+                  << committed << " allocated=" << allocated << " frag_ratio="
+                  << std::fixed << std::setprecision(2)
                   << 100 * (static_cast<float>(committed - allocated) /
                             static_cast<float>(committed));
     }
-    mi_override_thread(prev_thd);
-    mi_heap_set_default(prev_heap);
-    heap_lk.unlock();
+    mi_heap_set_default(prev_heap);
+    if (is_override_thd) {
+        mi_override_thread(prev_thd);
+    } else {
+        mi_restore_default_thread_id();
+    }
+    heap_lk.unlock();
 }

Confirm your mimalloc version exposes mi_thread_stats(size_t*, size_t*). If not, switch to a supported API (e.g., mi_stats_print or heap-specific stats).


🌐 Web query:

Does mimalloc expose a function mi_thread_stats returning committed and allocated for the current thread, and what is its signature in the version you use?

💡 Result:

Short answer: No — there is no exported function named mi_thread_stats. Instead:

  • For printing the current thread's stats: mi_thread_stats_print_out(mi_output_fun *out, void *arg). [1]
  • For programmatic stats you can use mi_stats_get(size_t stats_size, mi_stats_t *stats) or mi_stats_get_json(...), which return overall (and per-thread merged) statistics via the mi_stats_t structure / JSON. [2]

Sources: mimalloc docs (extended API / stats). [1] https://microsoft.github.io/mimalloc/group__extended.html [2] https://microsoft.github.io/mimalloc/using.html



---

🏁 Script executed:

```shell
#!/bin/bash
# Find mimalloc version specification
echo "=== Looking for mimalloc version ==="
fd -t f '(CMakeLists|package\.json|vcpkg\.json|conanfile)' | head -20 | xargs grep -l mimalloc 2>/dev/null || true
find . -name "*.cmake" -o -name "*mimalloc*" 2>/dev/null | head -30 || true

Length of output: 51


🏁 Script executed:

#!/bin/bash
# Search for mi_thread_stats definition and usage in codebase
echo "=== Searching for mi_thread_stats in codebase ==="
rg -n 'mi_thread_stats' --type=cpp --type=h -C 3

Length of output: 5105


🏁 Script executed:

#!/bin/bash
# Check what other mi_* functions are used
echo "=== Other mimalloc function calls in codebase ==="
rg -n '\bmi_[a-z_]+\s*\(' --type=cpp --type=h -o | sort | uniq -c | sort -rn | head -30

Length of output: 1884


🏁 Script executed:

#!/bin/bash
# Look for mimalloc includes and version checks
echo "=== Mimalloc includes and version ==="
rg -n '#include.*mimalloc' --type=cpp --type=h -C 2

Length of output: 1716



---

🏁 Script executed:

```shell
#!/bin/bash
# Check CMakeLists.txt for mimalloc configuration
echo "=== Mimalloc in CMakeLists.txt ==="
grep -n -i 'mimalloc\|find_package\|add_subdirectory' CMakeLists.txt | head -50

Length of output: 434


🏁 Script executed:

#!/bin/bash
# Search for mi_thread_stats definition (not just calls)
echo "=== Searching for mi_thread_stats definition ==="
rg -n '(^|\s)mi_thread_stats\s*=' --type=cpp --type=h -C 3
rg -n '#define.*mi_thread_stats' --type=cpp --type=h -C 2

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check if there's a mimalloc wrapper or custom include
echo "=== Looking for custom mimalloc headers/wrappers ==="
fd -t f 'mimalloc' 2>/dev/null | grep -v node_modules
find . -type f -name '*mimalloc*' 2>/dev/null | head -20

Length of output: 52


🏁 Script executed:

#!/bin/bash
# Look for extern declarations or forward declarations of mi_thread_stats
echo "=== Extern/forward declarations of mi_thread_stats ==="
rg -n '(extern|void).*mi_thread_stats' --type=cpp --type=h -C 2

Length of output: 54



---

🏁 Script executed:

```shell
#!/bin/bash
# Check if iomanip is already included in local_cc_shards.cpp
echo "=== Checking includes in local_cc_shards.cpp ==="
head -100 src/cc/local_cc_shards.cpp | grep -n '#include'

Length of output: 1094


🏁 Script executed:

#!/bin/bash
# Verify the full context around line 6135-6153 for lock type requirements
echo "=== Context around ReportHashPartitionCkptHeapUsage ==="
sed -n '6130,6160p' src/cc/local_cc_shards.cpp

Length of output: 1246


🏁 Script executed:

#!/bin/bash
# Check similar patterns in the codebase to see lock types used with mi_override_thread
echo "=== Lock types used with mi_override_thread ==="
rg -B5 'mi_override_thread' --type=cpp | grep -E '(unique_lock|shared_lock|std::lock)' -A2 -B2

Length of output: 4398



---

🏁 Script executed:

```shell
#!/bin/bash
# Check if iomanip is included anywhere in local_cc_shards.cpp
echo "=== Full iomanip search in local_cc_shards.cpp ==="
grep -n 'iomanip' src/cc/local_cc_shards.cpp
echo "=== Setprecision without iomanip included: ==="
grep -n 'setprecision' src/cc/local_cc_shards.cpp

Length of output: 144


🏁 Script executed:

#!/bin/bash
# Verify line 4721 context for comparison (same heap_mux, different function)
echo "=== Line 4721 context (using same mutex with unique_lock) ==="
sed -n '4715,4730p' src/cc/local_cc_shards.cpp

Length of output: 718


ReportHashPartitionCkptHeapUsage(): shared lock with write operations is unsafe; missing header.

  • Takes shared_lock (read-only) but calls mi_override_thread() and mi_heap_set_default() (write ops); must use unique_lock. Compare line 4721 which uses the same mutex correctly.
  • Missing <iomanip> include required for std::setprecision; compile error.
  • Missing mi_is_override_thread() check before override; add for consistency with codebase pattern (see lines 2041, 1313, range_slice.cpp:513).
  • Use std::fixed with std::setprecision(2) for proper float formatting.

Apply:

void LocalCcShards::ReportHashPartitionCkptHeapUsage()
{
-    std::shared_lock<std::shared_mutex> heap_lk(hash_partition_ckpt_heap_mux_);
+    std::unique_lock<std::shared_mutex> heap_lk(hash_partition_ckpt_heap_mux_);
+    bool is_override_thd = mi_is_override_thread();
     mi_threadid_t prev_thd = mi_override_thread(hash_partition_main_thread_id_);
     mi_heap_t *prev_heap = mi_heap_set_default(hash_partition_ckpt_heap_);
     int64_t allocated, committed;
     mi_thread_stats(&allocated, &committed);
     if (committed != 0)
     {
         LOG(INFO) << "ckpt hash partition heap memory usage report, committed "
                   << committed << ", allocated " << allocated << ", frag ratio "
-                  << std::setprecision(2)
+                  << std::fixed << std::setprecision(2)
                   << 100 * (static_cast<float>(committed - allocated) /
                             static_cast<float>(committed));
     }
-    mi_override_thread(prev_thd);
     mi_heap_set_default(prev_heap);
+    if (is_override_thd) {
+        mi_override_thread(prev_thd);
+    } else {
+        mi_restore_default_thread_id();
+    }
     heap_lk.unlock();
 }

Add #include <iomanip> to the header section of src/cc/local_cc_shards.cpp.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/cc/local_cc_shards.cpp around lines 6135 to 6153, the function
ReportHashPartitionCkptHeapUsage uses a shared_lock but performs thread/heap
overrides (write ops), omits the <iomanip> include, and lacks a
mi_is_override_thread() guard and std::fixed formatting; change
std::shared_lock<std::shared_mutex> to std::unique_lock<std::shared_mutex> for
hash_partition_ckpt_heap_mux_, add #include <iomanip> to the file headers, wrap
the mi_override_thread(hash_partition_main_thread_id_) and
mi_heap_set_default(hash_partition_ckpt_heap_) calls in a conditional that first
checks mi_is_override_thread() (only override if not already overridden)
consistent with other usages, and use std::fixed << std::setprecision(2) when
formatting the frag ratio; keep restoring prev_thd and prev_heap as before and
remove any unnecessary explicit unlock since unique_lock will manage it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants