feat: automatically adjust cpu when eloqstore is cloud mode#403
feat: automatically adjust cpu when eloqstore is cloud mode#403thweetkomputer wants to merge 3 commits into
Conversation
WalkthroughAdded a boolean flag to record when core count was auto-detected and threaded that value through EloqStoreConfig; EloqStoreConfig now adjusts cloud request thread defaults and may reduce usable core count, and storage init updates related gflags when core count changes. Changes
Sequence Diagram(s)sequenceDiagram
participant HW as Hardware
participant Core as DataSubstrate
participant Init as storage_init
participant Config as EloqStoreConfig
participant Store as Eloq Store Service
HW->>Core: provide hardware concurrency
Core-->>Init: core_num + core_num_auto_config_=true
Init->>Config: EloqStoreConfig(config, path, mem, core_num, core_num_auto_config)
Config->>Config: determine cloud_request_threads (config or fallback)
alt core_num_auto_config == true
Config->>Config: reserve CPUs (ceil(cloud_request_threads/3)), reduce core_number
Config->>Init: return adjusted core_number
Init->>Store: Set gflags (bthread_concurrency, eloq_store_worker_num)
end
Store-->>Config: initialized
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@core/src/storage_init.cpp`:
- Line 30: Remove the unused and invalid include directive "#include
\"tasks/task.h\"" from the top of the file; locate the include statement in the
file header (the one referencing tasks/task.h), delete it, and ensure no other
code depends on symbols from that header (if any symbols are used, replace with
the correct header instead).
In `@store_handler/eloq_data_store_service/eloq_store_config.cpp`:
- Around line 624-634: The code currently computes reserved_cpu as
eloqstore_configs_.cloud_request_threads / 3 which yields 0 when
cloud_request_threads is 1 or 2; change the calculation so at least one CPU is
reserved when cloud_request_threads > 0 (e.g. replace reserved_cpu =
eloqstore_configs_.cloud_request_threads / 3 with a min/max style calculation
such as reserved_cpu = std::max<size_t>(1,
eloqstore_configs_.cloud_request_threads / 3) or similar) and keep the existing
guard that ensures core_number stays >1 before subtracting; update references in
the core_number_auto_config block and the variables reserved_cpu, core_number,
and eloqstore_configs_.cloud_request_threads accordingly.
- Around line 602-635: The reduction of core_number in InitializeStorageHandler
(where eloqstore_configs_.cloud_request_threads and reserved_cpu are applied)
can leave bthread_concurrency set to the original core_config_.core_num (set
during Init() via LoadCoreAndNetworkConfig), causing oversubscription; update
the bthread_concurrency flag after you reduce core_number or move the core
reduction to occur before LoadCoreAndNetworkConfig() runs. Concretely, after the
block that modifies core_number (references: InitializeStorageHandler,
eloqstore_configs_.cloud_request_threads, core_number, reserved_cpu), set
FLAGS_bthread_concurrency (or call the function that applies
bthread_concurrency) to the new core_number so the bthread runtime and TxService
see the same reduced core count.
In `@store_handler/eloq_data_store_service/main.cpp`:
- Line 328: core_number is computed from num_vcpu without applying the 90%
reservation used elsewhere (data_substrate.cpp's core_num = max(1, (NUM_VCPU *
9) / 10)), causing inconsistent cloud_request_threads sizing; update the
computation in main.cpp to apply the same 90% factor (or explicitly document why
standalone differs) by changing the formula that sets core_number (referencing
the variables core_number and num_vcpu) to mirror the DataSubstrate approach
(use max(1, static_cast<uint32_t>((num_vcpu * 9) / 10)) before dividing for
cloud_request_threads) so both paths reserve ~10% of vCPUs, or add a clear
comment above core_number explaining the intentional deviation.
🧹 Nitpick comments (1)
core/include/data_substrate.h (1)
128-141: Inconsistent naming: trailing underscore oncore_num_auto_config_.All other fields in
CoreConfig(e.g.,core_num,enable_heap_defragment,enable_wal) do not use a trailing underscore. This field should follow the same convention for consistency.Suggested fix
- bool core_num_auto_config_{false}; + bool core_num_auto_config{false};This would also require updating all references (
core_config_.core_num_auto_config_→core_config_.core_num_auto_config) indata_substrate.cppandstorage_init.cpp.
177a72f to
e7fbbf8
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@core/src/storage_init.cpp`:
- Around line 292-306: The EloqStoreConfig is constructed using the original
core_config_.core_num so changing the gflag afterward does nothing; fix by
ensuring the effective thread count is applied before or inside EloqStoreConfig
creation—either move the GFLAGS_NAMESPACE::SetCommandLineOption calls for
"bthread_concurrency" and "eloq_store_worker_num" to before constructing
EloqStoreConfig (so EloqStoreConfig picks up the adjusted core_num), or
immediately after construction update
eloq_store_config.eloqstore_configs_.num_threads = core_config_.core_num and
then invoke the same internal recalculation logic that computes
buffer_pool_size, io_queue_size and max_inflight_write (the same computations
done in EloqStoreConfig constructor) so the values passed to ds_factory reflect
the adjusted core_num.
In `@store_handler/eloq_data_store_service/main.cpp`:
- Line 33: Remove the unused C++ header inclusion: delete the line that reads
"#include <thread>" from main.cpp since std::thread is not used; verify there
are no references to std::thread in functions within this file (e.g., main or
any helper functions) and leave bthread or configuration-related includes
intact.
🧹 Nitpick comments (2)
store_handler/eloq_data_store_service/main.cpp (1)
327-329: Standalone mode:unused_core_number = 0withcore_number_auto_config = falseis safe but worth a brief comment.Since
core_number_auto_configisfalse, the reservation logic inEloqStoreConfigwon't trigger. However,core_numberbeing 0 is passed by reference — if someone later changescore_number_auto_configtotrue, thecore_number / 4calculation on Line 617 of eloq_store_config.cpp would yield 0 forcloud_request_threads, and thecore_number > 1guard on Line 631 would prevent underflow. So this is safe, but a comment clarifying why 0 is intentional would help future readers.💡 Suggestion
- uint32_t unused_core_number = 0; + // Core number auto-config is disabled in standalone mode; value is unused. + uint32_t unused_core_number = 0;store_handler/eloq_data_store_service/eloq_store_config.cpp (1)
270-272:core_numbertaken by non-const reference — document the output semantic.The constructor modifies
core_numberas an output parameter, which is unusual for a constructor. Consider adding a brief doc comment on the parameter to indicate it may be reduced whencore_number_auto_configis true.
| auto original_core_num = core_config_.core_num; | ||
| EloqDS::EloqStoreConfig eloq_store_config( | ||
| config_reader, eloq_dss_data_path, core_config_.node_memory_limit_mb); | ||
| config_reader, | ||
| eloq_dss_data_path, | ||
| core_config_.node_memory_limit_mb, | ||
| core_config_.core_num, | ||
| core_config_.core_num_auto_config_); | ||
| if (core_config_.core_num != original_core_num) | ||
| { | ||
| auto adjusted_core_num = std::to_string(core_config_.core_num); | ||
| GFLAGS_NAMESPACE::SetCommandLineOption("bthread_concurrency", | ||
| adjusted_core_num.c_str()); | ||
| GFLAGS_NAMESPACE::SetCommandLineOption("eloq_store_worker_num", | ||
| adjusted_core_num.c_str()); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find the relevant files and understand the structure
find . -name "*.cpp" -o -name "*.h" | grep -E "(storage_init|eloq_store_config)" | head -20Repository: eloqdata/tx_service
Length of output: 212
🏁 Script executed:
# Check where num_threads is used and when eloq_store_worker_num is read
rg -n "num_threads|eloq_store_worker_num" --type=cpp -B2 -A2 | head -100Repository: eloqdata/tx_service
Length of output: 6856
🏁 Script executed:
# Look specifically at EloqStoreConfig constructor behavior and num_threads initialization
rg -n "class EloqStoreConfig|eloqstore_configs_|num_threads" --type=cpp -A1 | head -80Repository: eloqdata/tx_service
Length of output: 6221
Setting eloq_store_worker_num gflag after EloqStoreConfig construction is ineffective.
eloqstore_configs_.num_threads is captured at the start of the EloqStoreConfig constructor (eloq_store_config.cpp:274–281) and used immediately to calculate dependent config values (buffer_pool_size, io_queue_size, max_inflight_write divisions at lines 447, 473, 480). Updating the gflag on Line 304 after construction has no effect on the config object already in memory. The config is then passed to ds_factory, so eloqstore will use the original num_threads value, not the adjusted one.
To use the adjusted core_num for eloqstore worker threads, either set the gflag before EloqStoreConfig construction or directly modify the config object and recalculate dependent values:
eloq_store_config.eloqstore_configs_.num_threads = core_config_.core_num;
// Recalculate dependent values that were already divided by the old num_threads🤖 Prompt for AI Agents
In `@core/src/storage_init.cpp` around lines 292 - 306, The EloqStoreConfig is
constructed using the original core_config_.core_num so changing the gflag
afterward does nothing; fix by ensuring the effective thread count is applied
before or inside EloqStoreConfig creation—either move the
GFLAGS_NAMESPACE::SetCommandLineOption calls for "bthread_concurrency" and
"eloq_store_worker_num" to before constructing EloqStoreConfig (so
EloqStoreConfig picks up the adjusted core_num), or immediately after
construction update eloq_store_config.eloqstore_configs_.num_threads =
core_config_.core_num and then invoke the same internal recalculation logic that
computes buffer_pool_size, io_queue_size and max_inflight_write (the same
computations done in EloqStoreConfig constructor) so the values passed to
ds_factory reflect the adjusted core_num.
| #include <cassert> | ||
| #include <cstdint> | ||
| #include <filesystem> | ||
| #include <thread> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if <thread> symbols are used in main.cpp
echo "=== Searching for std::thread usage in main.cpp ==="
rg -n "std::thread|std::this_thread|hardware_concurrency" store_handler/eloq_data_store_service/main.cpp
echo ""
echo "=== File overview to check context ==="
wc -l store_handler/eloq_data_store_service/main.cppRepository: eloqdata/tx_service
Length of output: 205
🏁 Script executed:
#!/bin/bash
# Look at includes and check context around line 33
echo "=== First 50 lines of main.cpp showing includes ==="
head -50 store_handler/eloq_data_store_service/main.cpp | cat -n
echo ""
echo "=== Check for any thread-related code (different patterns) ==="
rg -i "thread" store_handler/eloq_data_store_service/main.cpp
echo ""
echo "=== Check if std:: is used at all (to understand coding style) ==="
rg "std::" store_handler/eloq_data_store_service/main.cpp | head -5Repository: eloqdata/tx_service
Length of output: 2458
Remove unused #include <thread> from line 33.
The <thread> header is not used anywhere in this file. While the codebase references threading concepts (via bthread and configuration parameters), the std::thread library is never invoked.
🤖 Prompt for AI Agents
In `@store_handler/eloq_data_store_service/main.cpp` at line 33, Remove the unused
C++ header inclusion: delete the line that reads "#include <thread>" from
main.cpp since std::thread is not used; verify there are no references to
std::thread in functions within this file (e.g., main or any helper functions)
and leave bthread or configuration-related includes intact.
Here are some reminders before you submit the pull request
fixes eloqdb/tx_service#issue_id./mtr --suite=mono_main,mono_multi,mono_basicSummary by CodeRabbit
New Features
Chores