chore: auto budget node memory across tx service and EloqStore components#339
Conversation
|
Warning Rate limit exceeded@thweetkomputer has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 13 minutes and 41 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
WalkthroughAdds node memory tracking to DataSubstrate, probes system RAM at startup, sets remaining_node_memory_mb_, and propagates that budget into storage (EloqStoreConfig) and tx_service to drive memory-aware configuration and validations. Changes
Sequence DiagramsequenceDiagram
participant Client as Process (startup)
participant DS as DataSubstrate
participant Sys as sysinfo
participant SInit as storage_init
participant ESC as EloqStoreConfig
participant TS as tx_service_init
Client->>DS: Start()
DS->>Sys: sysinfo() (probe RAM)
Sys-->>DS: total RAM
DS->>DS: remaining_node_memory_mb_ = 80% of total
Client->>SInit: Initialize storage
SInit->>ESC: EloqStoreConfig(config, path, remaining_node_memory_mb_, standalone)
ESC->>ESC: Compute index_buffer_pool_size (memory-aware)
ESC->>ESC: node_memory_mb -= allocated_pool_mb
ESC-->>SInit: configured store
Client->>TS: Initialize tx service
TS->>TS: Determine node_memory_limit_mb (use remaining_node_memory_mb_ if unset)
TS->>TS: Validate node_memory_limit_mb <= remaining_node_memory_mb_
TS->>TS: remaining_node_memory_mb_ -= node_memory_limit_mb
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI Agents
In @core/src/tx_service_init.cpp:
- Around line 116-120: The log is misleading because it prints
remaining_node_memory_mb_ but labels it "total memory"; update the LOG(INFO)
call in tx_service_init.cpp where node_memory_limit_mb,
remaining_node_memory_mb_, and field_mem are used so the message accurately
reflects values: either change the label from "total memory=" to "remaining
memory=" when printing remaining_node_memory_mb_, or, if an actual total memory
variable exists (e.g., total_node_memory_mb_ or similar elsewhere), print that
variable instead; ensure the log references field_mem, node_memory_limit_mb, and
remaining_node_memory_mb_ consistently so operators see correct semantics.
In @store_handler/eloq_data_store_service/eloq_store_config.cpp:
- Around line 352-375: The auto-calculation for index_buffer_pool_size can yield
"0MB" when node_memory_mb is very small (e.g., 1), so update the branch that
sets index_buffer_pool_size (the code using node_memory_mb / 2) to enforce a
minimum buffer size (e.g., at least 1MB or another configurable minimum) before
converting to string; use node_memory_mb and
FLAGS_eloq_store_index_buffer_pool_size/CheckCommandLineFlagIsDefault/config_reader.GetString
to locate the logic, apply max(node_memory_mb/2, minimum) or similar, and update
the LOG(INFO) message to reflect the enforced minimum when it is applied.
- Around line 38-40: The PR changes the default flag eloq_store_open_files_limit
to 400000 but deployment/container docs and resource limits aren’t updated;
update operational documentation and container manifests to require/describe
setting ulimit -n (or equivalent) to >=400000, mention the flag
eloq_store_open_files_limit and that data_substrate.cpp calls
setrlimit(RLIMIT_NOFILE, ...), and update any Docker/Kubernetes configs
(securityContext, init scripts, systemd drop-ins, or startup scripts) and
README/deployment guides to explicitly instruct operators how to configure host
and container file-descriptor limits to satisfy this new default.
- Line 123: The new command-line flag DEFINE_bool(eloq_store_data_append_mode,
true, ...) is missing from the README configuration reference; add an entry for
--eloq_store_data_append_mode in the Command Line Flags table (and optionally
the Configuration File section) documenting its purpose (controls whether
EloqStore writes use append mode), its default value (true), accepted values
(true/false), and a short usage example; reference the flag name
"eloq_store_data_append_mode" and ensure formatting matches existing entries
such as --enable_cache_replacement.
🧹 Nitpick comments (2)
core/src/data_substrate.cpp (1)
141-149: Add validation for edge cases in memory calculation.The memory calculation logic is sound, but consider adding checks for edge cases:
- Verify
mem_mibis non-zero before using it- Consider checking for potential overflow in
meminfo.totalram * meminfo.mem_unit🔎 Suggested validation improvements
struct sysinfo meminfo; if (sysinfo(&meminfo)) { LOG(ERROR) << "Failed to get system memory info: " << strerror(errno) << " when node_memory_limit_mb is not set"; return false; } uint32_t mem_mib = meminfo.totalram * meminfo.mem_unit / (1024 * 1024); +if (mem_mib == 0) +{ + LOG(ERROR) << "Invalid system memory detected: 0 MiB"; + return false; +} instance.remaining_node_memory_mb_ = mem_mib * 4 / 5; +LOG(INFO) << "Total system memory: " << mem_mib + << " MiB, allocated for node: " + << instance.remaining_node_memory_mb_ << " MiB (80%)";store_handler/eloq_data_store_service/eloq_store_config.cpp (1)
376-382: Refactor to eliminate repeated conversion and improve clarity.The expression
buffer_pool_size / (1024 * 1024)is computed twice (lines 377 and 382), which duplicates logic and reduces readability. Additionally, extracting this conversion makes the intent clearer.🔎 Proposed refactor to eliminate duplication
uint64_t buffer_pool_size = parse_size(index_buffer_pool_size); - if (buffer_pool_size / (1024 * 1024) >= node_memory_mb) + uint32_t buffer_pool_size_mb = buffer_pool_size / (1024 * 1024); + if (buffer_pool_size_mb >= node_memory_mb) { LOG(FATAL) << "buffer pool size (" << index_buffer_pool_size << ") exceeds node memory mb"; } - node_memory_mb -= buffer_pool_size / (1024 * 1024); + node_memory_mb -= buffer_pool_size_mb;
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/include/data_substrate.hcore/src/data_substrate.cppcore/src/storage_init.cppcore/src/tx_service_init.cppstore_handler/eloq_data_store_service/eloq_store_config.cppstore_handler/eloq_data_store_service/eloq_store_config.h
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-16T09:47:09.981Z
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:09.981Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.
Applied to files:
core/src/storage_init.cpp
🧬 Code graph analysis (2)
store_handler/eloq_data_store_service/eloq_store_config.cpp (2)
core/include/data_substrate.h (1)
CheckCommandLineFlagIsDefault(313-324)store_handler/eloq_data_store_service/main.cpp (1)
config_reader(162-162)
core/src/tx_service_init.cpp (1)
core/include/data_substrate.h (1)
remaining_node_memory_mb_(264-264)
🔇 Additional comments (8)
core/src/data_substrate.cpp (2)
25-25: LGTM - sysinfo include supports memory probing.The addition of
<sys/sysinfo.h>is necessary for the system memory probing logic added at lines 141-149.
72-72: Verify the 50x increase in maxclients is intentional and properly planned.The default
maxclientsvalue has been increased to 500,000—an extraordinarily high value for a data substrate system. This requires careful verification:
- OS-level tuning required: This directly sets RLIMIT_NOFILE (file descriptors) and will fail silently if OS limits aren't increased first.
- Production database context: Typical databases use far lower limits (MySQL ~151, PostgreSQL ~100 by default). Even modest production systems rarely exceed thousands of concurrent connections.
- Memory and resource implications: Connection pooling is the standard production approach to handle high client counts without massive per-connection overhead.
Confirm this change is intentional, document the expected workload justifying this value, and verify OS-level file descriptor and memory limits have been adjusted accordingly.
core/src/tx_service_init.cpp (2)
122-128: LGTM - validation prevents memory over-allocation.The validation check ensures
node_memory_limit_mbdoesn't exceed available memory, preventing configuration errors that could lead to resource exhaustion.
130-130: LGTM - deduction implements memory budgeting correctly.Subtracting the allocated memory from
remaining_node_memory_mb_enables downstream components to track and validate their memory budgets.core/src/storage_init.cpp (1)
291-294: LGTM - passes memory budget to storage configuration.The updated constructor call correctly passes
remaining_node_memory_mb_by reference, enablingEloqStoreConfigto allocate and deduct its memory budget. This change integrates with the centralized memory management approach whereDataSubstrate::Start()initializes the memory pool and components sequentially claim their allocations.store_handler/eloq_data_store_service/eloq_store_config.h (1)
35-36: LGTM - reference parameter enables memory budget tracking.The updated constructor signature accepts
node_memory_mbby reference, allowingEloqStoreConfigto read the available memory budget, allocate resources accordingly, and update the remaining budget for downstream components. This aligns with the centralized memory management approach introduced in this PR.core/include/data_substrate.h (1)
264-264: No duplicate declaration exists.The search confirms only a single declaration of
remaining_node_memory_mb_at line 264. No duplicate in public or private sections. The concern is unfounded.Likely an incorrect or invalid review comment.
store_handler/eloq_data_store_service/eloq_store_config.cpp (1)
240-242: LGTM: Memory accounting parameter added.The constructor signature correctly adds a
node_memory_mbreference parameter to enable memory budget tracking. The parameter is appropriately passed by non-const reference since the constructor modifies it to account for allocated memory.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI Agents
In @core/src/storage_init.cpp:
- Line 30: Remove the erroneous include of "log_utils.h" from storage_init.cpp
(the header does not exist) and rely on glog's LOG() instead; specifically
delete the line with #include "log_utils.h" and if storage_init.cpp does not
already include glog, add #include <glog/logging.h> so calls to LOG() resolve.
In @store_handler/eloq_data_store_service/main.cpp:
- Around line 298-308: The error return in main() uses `return false;` which
converts to 0 (success); change it to return a non-zero exit code (e.g., `return
1;`) so failures are signaled correctly. Locate the sysinfo call and the error
branch around `sysinfo(&meminfo)` in main (where `meminfo` and `mem_mib` are
computed) and replace the `return false;` with a proper integer error return;
consider using a named constant or EXIT_FAILURE for clarity.
🧹 Nitpick comments (3)
store_handler/eloq_data_store_service/eloq_store_config.cpp (1)
240-243: Consider documenting the in-place mutation ofnode_memory_mbor using a return value.The constructor mutates the
node_memory_mbreference parameter (line 386), decrementing it by the allocated buffer pool size. While this enables memory tracking across initialization stages, the side effect is implicit and could surprise callers unfamiliar with the implementation.Additionally, the validation at line 381 uses
>rather than>=. Ifbuffer_pool_sizeexactly equalsnode_memory_mb, the check passes but line 386 setsnode_memory_mbto zero, potentially causing issues downstream.💡 Suggestions
Option 1: Document the mutation behavior in a comment above the constructor:
+// Note: node_memory_mb is decremented by the allocated buffer pool size. EloqStoreConfig::EloqStoreConfig(const INIReader &config_reader, const std::string_view base_data_path, uint32_t &node_memory_mb, bool standalone)Option 2: Change validation to
>=if zero remaining memory is problematic:- if (buffer_pool_size / (1024 * 1024) > node_memory_mb) + if (buffer_pool_size / (1024 * 1024) >= node_memory_mb) { - LOG(FATAL) << "buffer pool size (" << index_buffer_pool_size - << ") exceeds node memory mb"; + LOG(FATAL) << "buffer pool size (" << index_buffer_pool_size + << ") exceeds or equals node memory mb, leaving no memory for other components"; }Also applies to: 353-386
store_handler/eloq_data_store_service/main.cpp (1)
305-306: Consider overflow protection in memory calculation.The calculation
meminfo.totalram * meminfo.mem_unit / (1024 * 1024) * 4 / 5could overflow iftotalramandmem_unitare both large. While unlikely on current systems, defensive coding would useuint64_tor reorder operations to minimize overflow risk.💡 Proposed refactor
- uint32_t mem_mib = - meminfo.totalram * meminfo.mem_unit / (1024 * 1024) * 4 / 5; + // Calculate 80% of total RAM in MiB, with overflow protection + uint64_t total_mib = static_cast<uint64_t>(meminfo.totalram) * + meminfo.mem_unit / (1024 * 1024); + uint32_t mem_mib = static_cast<uint32_t>(total_mib * 4 / 5);store_handler/eloq_data_store_service/eloq_store_config.h (1)
34-37: Add documentation for the constructor's side effect onnode_memory_mb.The constructor modifies
node_memory_mbby reference (decrementing it by the allocated buffer pool size), but this side effect is not documented in the header. Adding a brief comment would improve API clarity for callers.💡 Proposed documentation
+ /// Constructor for EloqStoreConfig + /// @param config_reader Configuration reader for INI settings + /// @param base_data_path Base directory path for data storage + /// @param node_memory_mb Available node memory in MB (modified: decremented by allocated buffer size) + /// @param standalone Whether this is a standalone EloqStore instance (default: false) EloqStoreConfig(const INIReader &config_reader, const std::string_view base_data_path, uint32_t &node_memory_mb, bool standalone = false);
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
core/src/storage_init.cppcore/src/tx_service_init.cppstore_handler/eloq_data_store_service/eloq_store_config.cppstore_handler/eloq_data_store_service/eloq_store_config.hstore_handler/eloq_data_store_service/main.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- core/src/tx_service_init.cpp
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-16T09:47:09.981Z
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:09.981Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.
Applied to files:
core/src/storage_init.cpp
🧬 Code graph analysis (1)
store_handler/eloq_data_store_service/eloq_store_config.cpp (2)
store_handler/eloq_data_store_service/main.cpp (3)
CheckCommandLineFlagIsDefault(89-101)CheckCommandLineFlagIsDefault(89-89)config_reader(163-163)core/include/data_substrate.h (1)
CheckCommandLineFlagIsDefault(313-324)
🔇 Additional comments (1)
core/src/storage_init.cpp (1)
292-293: LGTM! Constructor call correctly passes memory tracking parameter.The updated
EloqStoreConfigconstructor call properly passesremaining_node_memory_mb_by reference, enabling memory-aware buffer pool sizing. The variable is initialized at line 149 ofDataSubstrate::Start()(set to 80% of total system memory) beforeInitializeStorageHandler()is invoked at line 182, ensuring it is available for the constructor call at lines 292-293. Thestandaloneparameter appropriately defaults tofalsefor this tx_service integration context.
Here are some reminders before you submit the pull request
fixes eloqdb/tx_service#issue_id./mtr --suite=mono_main,mono_multi,mono_basicSummary by CodeRabbit
New Features
Improvements
Behavior
✏️ Tip: You can customize this high-level summary in your review settings.