Skip to content

fix standby crash on mem full#204

Merged
liunyl merged 1 commit into
mainfrom
standby_oom
Nov 12, 2025
Merged

fix standby crash on mem full#204
liunyl merged 1 commit into
mainfrom
standby_oom

Conversation

@liunyl

@liunyl liunyl commented Nov 12, 2025

Copy link
Copy Markdown
Contributor

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

  • Bug Fixes
    • Improved system stability and resilience during memory constraints. Request processing is now intelligently deferred and queued when system resources reach maximum capacity, preventing out-of-memory failures that previously caused unexpected service disruptions. This enhancement particularly improves overall system reliability and stability on resource-constrained nodes and deployment environments.

@coderabbitai

coderabbitai Bot commented Nov 12, 2025

Copy link
Copy Markdown

Walkthrough

Modified include/cc/object_cc_map.h to add a nullptr check for looked-up CCE. When not found, requests are enqueued to the memory-full wait list and the function returns false instead of proceeding. This guard applies to both initial request processing and standby-forward paths, preventing OOM scenarios on standby nodes.

Changes

Cohort / File(s) Summary
Null pointer guard for CCE lookup
include/cc/object_cc_map.h
Added nullptr check when CCE is not found during request lookup; enqueues request to memory-full wait list and returns false; prevents OOM on standby nodes; applied to both initial "first time processed" path and standby-forward path

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify the nullptr check is applied consistently in both code paths (initial and standby-forward)
  • Confirm the memory-full wait list enqueueing behavior is the correct fallback mechanism
  • Ensure the return value (false) is handled appropriately by callers

Suggested reviewers

  • MrGuin

Poem

🐰 A nullptr guard, so neat and small,
Protects from crashes, saves us all—
When memory's tight and checks run deep,
The standby nodes can safely sleep!
hops with glee

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description only contains the template checklist with all items unchecked, but provides no actual implementation details, testing evidence, or issue references required for a complete PR description. Add implementation details explaining the fix, confirm tests were added/passed, document changes, reference the issue with 'fixes eloqdb/tx_service#issue_id', and verify test suite passed.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix standby crash on mem full' directly relates to the core change: fixing a standby crash when memory is full by adding a nullptr guard to prevent OOM scenarios.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch standby_oom

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
include/cc/object_cc_map.h (1)

1840-1840: Optional: Consider removing the now-redundant assertion.

After the nullptr check at lines 1835-1839, cce is guaranteed to be non-null at this point, making the assertion always true (dead code). While it doesn't cause any harm, you might consider removing it for clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4a1afb and 1984407.

📒 Files selected for processing (1)
  • include/cc/object_cc_map.h (1 hunks)
🔇 Additional comments (1)
include/cc/object_cc_map.h (1)

1835-1839: LGTM! Fix prevents standby crash on OOM.

The nullptr check correctly handles the case where FindEmplace fails to allocate a new CCE due to memory exhaustion. By enqueuing the request to the memory-full wait list and returning false, the request will be retried once capacity becomes available. This pattern is consistent with other handlers in this file (e.g., ApplyCc at lines 430-456, ReplayLogCc at lines 2196-2205).

@liunyl liunyl merged commit 5eaf657 into main Nov 12, 2025
4 checks passed
@liunyl liunyl deleted the standby_oom branch November 12, 2025 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant