Skip to content

chore: auto budget node memory across tx service and EloqStore components#339

Merged
thweetkomputer merged 7 commits into
mainfrom
clean_zc
Jan 6, 2026
Merged

chore: auto budget node memory across tx service and EloqStore components#339
thweetkomputer merged 7 commits into
mainfrom
clean_zc

Conversation

@thweetkomputer

@thweetkomputer thweetkomputer commented Jan 6, 2026

Copy link
Copy Markdown
Collaborator

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

  • New Features

    • System memory auto-detection and centralized remaining-memory tracking
    • Validation to prevent configured memory exceeding available node memory
  • Improvements

    • Increased concurrent connection capacity (10K → 500K)
    • Increased file descriptor limit for data store (1K → 400K)
    • Automatic, memory-aware buffer pool sizing
    • Data append mode enabled by default
  • Behavior

    • New standalone-aware initialization for the data store configuration

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Jan 6, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@thweetkomputer has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 13 minutes and 41 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 7efeeac and d693071.

📒 Files selected for processing (3)
  • core/src/storage_init.cpp
  • core/src/tx_service_init.cpp
  • store_handler/eloq_data_store_service/main.cpp

Walkthrough

Adds node memory tracking to DataSubstrate, probes system RAM at startup, sets remaining_node_memory_mb_, and propagates that budget into storage (EloqStoreConfig) and tx_service to drive memory-aware configuration and validations.

Changes

Cohort / File(s) Summary
DataSubstrate & startup
core/include/data_substrate.h, core/src/data_substrate.cpp
Adds remaining_node_memory_mb_ member (note: declared in both public and private sections). Probes system memory via sysinfo() in Start(), sets remaining memory to 80% of total, increases maxclients from 10,000→500,000, and aborts Start() on probe failure.
Storage init & EloqStoreConfig usage
core/src/storage_init.cpp, store_handler/eloq_data_store_service/main.cpp
Passes remaining_node_memory_mb_ into EloqStoreConfig construction (constructor call sites updated to include memory and standalone flag). Includes log_utils.h where needed.
Transaction service memory handling
core/src/tx_service_init.cpp
Replaces local sysinfo/mem probing with use of remaining_node_memory_mb_ for defaults, validates configured node memory ≤ remaining, subtracts allocated memory from remaining, and updates related log messages.
Eloq store config & limits
store_handler/eloq_data_store_service/eloq_store_config.h, store_handler/eloq_data_store_service/eloq_store_config.cpp
Constructor signature changed to accept uint32_t &node_memory_mb and bool standalone (defaults allowed). Adds memory-aware index_buffer_pool_size allocation that consumes node memory (and decrements node_memory_mb), raises eloq_store_open_files_limit 1,024→400,000, removes public eloq_store_cloud_worker_count flag, and changes eloq_store_data_append_mode default to true.

Sequence Diagram

sequenceDiagram
    participant Client as Process (startup)
    participant DS as DataSubstrate
    participant Sys as sysinfo
    participant SInit as storage_init
    participant ESC as EloqStoreConfig
    participant TS as tx_service_init

    Client->>DS: Start()
    DS->>Sys: sysinfo() (probe RAM)
    Sys-->>DS: total RAM
    DS->>DS: remaining_node_memory_mb_ = 80% of total

    Client->>SInit: Initialize storage
    SInit->>ESC: EloqStoreConfig(config, path, remaining_node_memory_mb_, standalone)
    ESC->>ESC: Compute index_buffer_pool_size (memory-aware)
    ESC->>ESC: node_memory_mb -= allocated_pool_mb
    ESC-->>SInit: configured store

    Client->>TS: Initialize tx service
    TS->>TS: Determine node_memory_limit_mb (use remaining_node_memory_mb_ if unset)
    TS->>TS: Validate node_memory_limit_mb <= remaining_node_memory_mb_
    TS->>TS: remaining_node_memory_mb_ -= node_memory_limit_mb
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
I counted carrots in the burrow wide,
Eighty percent I tucked inside,
Each tunnel takes its nimble share,
Balances kept with tender care,
Hoppity hops—no bytes run dry! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description only contains a checklist template with unchecked items and lacks substantive details about the changes, objectives, implementation approach, or testing information. Fill in the checklist items with actual details: describe the changes made, reference any related issues/RFCs, document which tests were added, and confirm test suite results.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main objective: automating node memory budgeting across tx service and EloqStore components, which matches the comprehensive memory-aware changes throughout the changeset.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI Agents
In @core/src/tx_service_init.cpp:
- Around line 116-120: The log is misleading because it prints
remaining_node_memory_mb_ but labels it "total memory"; update the LOG(INFO)
call in tx_service_init.cpp where node_memory_limit_mb,
remaining_node_memory_mb_, and field_mem are used so the message accurately
reflects values: either change the label from "total memory=" to "remaining
memory=" when printing remaining_node_memory_mb_, or, if an actual total memory
variable exists (e.g., total_node_memory_mb_ or similar elsewhere), print that
variable instead; ensure the log references field_mem, node_memory_limit_mb, and
remaining_node_memory_mb_ consistently so operators see correct semantics.

In @store_handler/eloq_data_store_service/eloq_store_config.cpp:
- Around line 352-375: The auto-calculation for index_buffer_pool_size can yield
"0MB" when node_memory_mb is very small (e.g., 1), so update the branch that
sets index_buffer_pool_size (the code using node_memory_mb / 2) to enforce a
minimum buffer size (e.g., at least 1MB or another configurable minimum) before
converting to string; use node_memory_mb and
FLAGS_eloq_store_index_buffer_pool_size/CheckCommandLineFlagIsDefault/config_reader.GetString
to locate the logic, apply max(node_memory_mb/2, minimum) or similar, and update
the LOG(INFO) message to reflect the enforced minimum when it is applied.
- Around line 38-40: The PR changes the default flag eloq_store_open_files_limit
to 400000 but deployment/container docs and resource limits aren’t updated;
update operational documentation and container manifests to require/describe
setting ulimit -n (or equivalent) to >=400000, mention the flag
eloq_store_open_files_limit and that data_substrate.cpp calls
setrlimit(RLIMIT_NOFILE, ...), and update any Docker/Kubernetes configs
(securityContext, init scripts, systemd drop-ins, or startup scripts) and
README/deployment guides to explicitly instruct operators how to configure host
and container file-descriptor limits to satisfy this new default.
- Line 123: The new command-line flag DEFINE_bool(eloq_store_data_append_mode,
true, ...) is missing from the README configuration reference; add an entry for
--eloq_store_data_append_mode in the Command Line Flags table (and optionally
the Configuration File section) documenting its purpose (controls whether
EloqStore writes use append mode), its default value (true), accepted values
(true/false), and a short usage example; reference the flag name
"eloq_store_data_append_mode" and ensure formatting matches existing entries
such as --enable_cache_replacement.
🧹 Nitpick comments (2)
core/src/data_substrate.cpp (1)

141-149: Add validation for edge cases in memory calculation.

The memory calculation logic is sound, but consider adding checks for edge cases:

  1. Verify mem_mib is non-zero before using it
  2. Consider checking for potential overflow in meminfo.totalram * meminfo.mem_unit
🔎 Suggested validation improvements
 struct sysinfo meminfo;
 if (sysinfo(&meminfo))
 {
     LOG(ERROR) << "Failed to get system memory info: " << strerror(errno)
                << " when node_memory_limit_mb is not set";
     return false;
 }
 uint32_t mem_mib = meminfo.totalram * meminfo.mem_unit / (1024 * 1024);
+if (mem_mib == 0)
+{
+    LOG(ERROR) << "Invalid system memory detected: 0 MiB";
+    return false;
+}
 instance.remaining_node_memory_mb_ = mem_mib * 4 / 5;
+LOG(INFO) << "Total system memory: " << mem_mib 
+          << " MiB, allocated for node: " 
+          << instance.remaining_node_memory_mb_ << " MiB (80%)";
store_handler/eloq_data_store_service/eloq_store_config.cpp (1)

376-382: Refactor to eliminate repeated conversion and improve clarity.

The expression buffer_pool_size / (1024 * 1024) is computed twice (lines 377 and 382), which duplicates logic and reduces readability. Additionally, extracting this conversion makes the intent clearer.

🔎 Proposed refactor to eliminate duplication
     uint64_t buffer_pool_size = parse_size(index_buffer_pool_size);
-    if (buffer_pool_size / (1024 * 1024) >= node_memory_mb)
+    uint32_t buffer_pool_size_mb = buffer_pool_size / (1024 * 1024);
+    if (buffer_pool_size_mb >= node_memory_mb)
     {
         LOG(FATAL) << "buffer pool size (" << index_buffer_pool_size
                    << ") exceeds node memory mb";
     }
-    node_memory_mb -= buffer_pool_size / (1024 * 1024);
+    node_memory_mb -= buffer_pool_size_mb;
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc6852e and de5353d.

📒 Files selected for processing (6)
  • core/include/data_substrate.h
  • core/src/data_substrate.cpp
  • core/src/storage_init.cpp
  • core/src/tx_service_init.cpp
  • store_handler/eloq_data_store_service/eloq_store_config.cpp
  • store_handler/eloq_data_store_service/eloq_store_config.h
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-16T09:47:09.981Z
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:09.981Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.

Applied to files:

  • core/src/storage_init.cpp
🧬 Code graph analysis (2)
store_handler/eloq_data_store_service/eloq_store_config.cpp (2)
core/include/data_substrate.h (1)
  • CheckCommandLineFlagIsDefault (313-324)
store_handler/eloq_data_store_service/main.cpp (1)
  • config_reader (162-162)
core/src/tx_service_init.cpp (1)
core/include/data_substrate.h (1)
  • remaining_node_memory_mb_ (264-264)
🔇 Additional comments (8)
core/src/data_substrate.cpp (2)

25-25: LGTM - sysinfo include supports memory probing.

The addition of <sys/sysinfo.h> is necessary for the system memory probing logic added at lines 141-149.


72-72: Verify the 50x increase in maxclients is intentional and properly planned.

The default maxclients value has been increased to 500,000—an extraordinarily high value for a data substrate system. This requires careful verification:

  • OS-level tuning required: This directly sets RLIMIT_NOFILE (file descriptors) and will fail silently if OS limits aren't increased first.
  • Production database context: Typical databases use far lower limits (MySQL ~151, PostgreSQL ~100 by default). Even modest production systems rarely exceed thousands of concurrent connections.
  • Memory and resource implications: Connection pooling is the standard production approach to handle high client counts without massive per-connection overhead.

Confirm this change is intentional, document the expected workload justifying this value, and verify OS-level file descriptor and memory limits have been adjusted accordingly.

core/src/tx_service_init.cpp (2)

122-128: LGTM - validation prevents memory over-allocation.

The validation check ensures node_memory_limit_mb doesn't exceed available memory, preventing configuration errors that could lead to resource exhaustion.


130-130: LGTM - deduction implements memory budgeting correctly.

Subtracting the allocated memory from remaining_node_memory_mb_ enables downstream components to track and validate their memory budgets.

core/src/storage_init.cpp (1)

291-294: LGTM - passes memory budget to storage configuration.

The updated constructor call correctly passes remaining_node_memory_mb_ by reference, enabling EloqStoreConfig to allocate and deduct its memory budget. This change integrates with the centralized memory management approach where DataSubstrate::Start() initializes the memory pool and components sequentially claim their allocations.

store_handler/eloq_data_store_service/eloq_store_config.h (1)

35-36: LGTM - reference parameter enables memory budget tracking.

The updated constructor signature accepts node_memory_mb by reference, allowing EloqStoreConfig to read the available memory budget, allocate resources accordingly, and update the remaining budget for downstream components. This aligns with the centralized memory management approach introduced in this PR.

core/include/data_substrate.h (1)

264-264: No duplicate declaration exists.

The search confirms only a single declaration of remaining_node_memory_mb_ at line 264. No duplicate in public or private sections. The concern is unfounded.

Likely an incorrect or invalid review comment.

store_handler/eloq_data_store_service/eloq_store_config.cpp (1)

240-242: LGTM: Memory accounting parameter added.

The constructor signature correctly adds a node_memory_mb reference parameter to enable memory budget tracking. The parameter is appropriately passed by non-const reference since the constructor modifies it to account for allocated memory.

Comment thread core/src/tx_service_init.cpp Outdated
Comment thread store_handler/eloq_data_store_service/eloq_store_config.cpp
Comment thread store_handler/eloq_data_store_service/eloq_store_config.cpp
Comment thread store_handler/eloq_data_store_service/eloq_store_config.cpp

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @core/src/storage_init.cpp:
- Line 30: Remove the erroneous include of "log_utils.h" from storage_init.cpp
(the header does not exist) and rely on glog's LOG() instead; specifically
delete the line with #include "log_utils.h" and if storage_init.cpp does not
already include glog, add #include <glog/logging.h> so calls to LOG() resolve.

In @store_handler/eloq_data_store_service/main.cpp:
- Around line 298-308: The error return in main() uses `return false;` which
converts to 0 (success); change it to return a non-zero exit code (e.g., `return
1;`) so failures are signaled correctly. Locate the sysinfo call and the error
branch around `sysinfo(&meminfo)` in main (where `meminfo` and `mem_mib` are
computed) and replace the `return false;` with a proper integer error return;
consider using a named constant or EXIT_FAILURE for clarity.
🧹 Nitpick comments (3)
store_handler/eloq_data_store_service/eloq_store_config.cpp (1)

240-243: Consider documenting the in-place mutation of node_memory_mb or using a return value.

The constructor mutates the node_memory_mb reference parameter (line 386), decrementing it by the allocated buffer pool size. While this enables memory tracking across initialization stages, the side effect is implicit and could surprise callers unfamiliar with the implementation.

Additionally, the validation at line 381 uses > rather than >=. If buffer_pool_size exactly equals node_memory_mb, the check passes but line 386 sets node_memory_mb to zero, potentially causing issues downstream.

💡 Suggestions

Option 1: Document the mutation behavior in a comment above the constructor:

+// Note: node_memory_mb is decremented by the allocated buffer pool size.
 EloqStoreConfig::EloqStoreConfig(const INIReader &config_reader,
                                  const std::string_view base_data_path,
                                  uint32_t &node_memory_mb,
                                  bool standalone)

Option 2: Change validation to >= if zero remaining memory is problematic:

-    if (buffer_pool_size / (1024 * 1024) > node_memory_mb)
+    if (buffer_pool_size / (1024 * 1024) >= node_memory_mb)
     {
-        LOG(FATAL) << "buffer pool size (" << index_buffer_pool_size
-                   << ") exceeds node memory mb";
+        LOG(FATAL) << "buffer pool size (" << index_buffer_pool_size
+                   << ") exceeds or equals node memory mb, leaving no memory for other components";
     }

Also applies to: 353-386

store_handler/eloq_data_store_service/main.cpp (1)

305-306: Consider overflow protection in memory calculation.

The calculation meminfo.totalram * meminfo.mem_unit / (1024 * 1024) * 4 / 5 could overflow if totalram and mem_unit are both large. While unlikely on current systems, defensive coding would use uint64_t or reorder operations to minimize overflow risk.

💡 Proposed refactor
-    uint32_t mem_mib =
-        meminfo.totalram * meminfo.mem_unit / (1024 * 1024) * 4 / 5;
+    // Calculate 80% of total RAM in MiB, with overflow protection
+    uint64_t total_mib = static_cast<uint64_t>(meminfo.totalram) * 
+                         meminfo.mem_unit / (1024 * 1024);
+    uint32_t mem_mib = static_cast<uint32_t>(total_mib * 4 / 5);
store_handler/eloq_data_store_service/eloq_store_config.h (1)

34-37: Add documentation for the constructor's side effect on node_memory_mb.

The constructor modifies node_memory_mb by reference (decrementing it by the allocated buffer pool size), but this side effect is not documented in the header. Adding a brief comment would improve API clarity for callers.

💡 Proposed documentation
+    /// Constructor for EloqStoreConfig
+    /// @param config_reader Configuration reader for INI settings
+    /// @param base_data_path Base directory path for data storage
+    /// @param node_memory_mb Available node memory in MB (modified: decremented by allocated buffer size)
+    /// @param standalone Whether this is a standalone EloqStore instance (default: false)
     EloqStoreConfig(const INIReader &config_reader,
                     const std::string_view base_data_path,
                     uint32_t &node_memory_mb,
                     bool standalone = false);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de5353d and 7efeeac.

📒 Files selected for processing (5)
  • core/src/storage_init.cpp
  • core/src/tx_service_init.cpp
  • store_handler/eloq_data_store_service/eloq_store_config.cpp
  • store_handler/eloq_data_store_service/eloq_store_config.h
  • store_handler/eloq_data_store_service/main.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/src/tx_service_init.cpp
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-16T09:47:09.981Z
Learnt from: githubzilla
Repo: eloqdata/tx_service PR: 288
File: store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h:54-76
Timestamp: 2025-12-16T09:47:09.981Z
Learning: In RocksDBCloudDataStoreFactory::CreateDataStore (store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h), when oss_url is configured, the factory always appends /ds_<shard_id> to the oss_url. Users should not include the shard path in the oss_url themselves.

Applied to files:

  • core/src/storage_init.cpp
🧬 Code graph analysis (1)
store_handler/eloq_data_store_service/eloq_store_config.cpp (2)
store_handler/eloq_data_store_service/main.cpp (3)
  • CheckCommandLineFlagIsDefault (89-101)
  • CheckCommandLineFlagIsDefault (89-89)
  • config_reader (163-163)
core/include/data_substrate.h (1)
  • CheckCommandLineFlagIsDefault (313-324)
🔇 Additional comments (1)
core/src/storage_init.cpp (1)

292-293: LGTM! Constructor call correctly passes memory tracking parameter.

The updated EloqStoreConfig constructor call properly passes remaining_node_memory_mb_ by reference, enabling memory-aware buffer pool sizing. The variable is initialized at line 149 of DataSubstrate::Start() (set to 80% of total system memory) before InitializeStorageHandler() is invoked at line 182, ensuring it is available for the constructor call at lines 292-293. The standalone parameter appropriately defaults to false for this tx_service integration context.

Comment thread core/src/storage_init.cpp Outdated
Comment thread store_handler/eloq_data_store_service/main.cpp
@thweetkomputer thweetkomputer changed the title chore: add automatical setting of node memory parameters chore: auto budget node memory across tx service and EloqStore components Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants