Skip to content

*: enhance mem tracker exclude rss_file & check limit for columnar#10867

Open
yongman wants to merge 9 commits into
pingcap:masterfrom
yongman:fix-mem-tracker-columnar
Open

*: enhance mem tracker exclude rss_file & check limit for columnar#10867
yongman wants to merge 9 commits into
pingcap:masterfrom
yongman:fix-mem-tracker-columnar

Conversation

@yongman
Copy link
Copy Markdown
Member

@yongman yongman commented May 25, 2026

What problem does this PR solve?

Issue Number: ref #10844

Problem Summary:

  1. Exclude rss_file result to query failed with memory exceeded exception.
  2. For columnar read path, the memory usage can not counted for columnar hub in Rust.

What is changed and how it works?

  1. Exclude rss_file part when check the memory limitation. rss_file part can be auto freed by OS.
  2. Actively check the memory limitation after read block from columnar hub in Rust to avoid OOM.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • Bug Fixes

    • More accurate memory-limit enforcement by excluding file-backed RSS and using an effective RSS measure.
    • Improved process memory monitoring in background collection to capture additional metrics.
    • Ensured jemalloc stats are refreshed before sampling for more consistent metrics.
  • Chores

    • Added runtime RSS checks in storage read paths to better enforce limits.
    • Updated testing setting description to reference effective RSS.

Review Change Stack

yongman added 5 commits May 25, 2026 14:41
Signed-off-by: yongman <yming0221@gmail.com>
(cherry picked from commit d54c26a4afa31b362c538394190f29299b94970c)
Signed-off-by: yongman <yming0221@gmail.com>
(cherry picked from commit ec4bf8011fa46f8c9750594a01f7303f7bd96689)
Signed-off-by: yongman <yming0221@gmail.com>
(cherry picked from commit aeb3bf1e0300d92c6fa12ef21505b052a5fd6b41)
Signed-off-by: yongman <yming0221@gmail.com>
(cherry picked from commit d946399942e70bf0d8aa1b7567ad5e498108f2e2)
Signed-off-by: yongman <yming0221@gmail.com>
(cherry picked from commit 66c3e5a377acb05cbb9cd819e06b0fc94d8ada9a)
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 25, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels May 25, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 25, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ichn-hu, likidu for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 25, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f28c017e-0109-4395-9537-33f9c9dedae9

📥 Commits

Reviewing files that changed from the base of the PR and between 6b4336a and dd94f94.

📒 Files selected for processing (2)
  • dbms/src/Common/MemoryTracker.cpp
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp

📝 Walkthrough

Walkthrough

The PR separates file-backed RSS from anonymous RSS, records rss_file via process metrics, computes effective RSS = max(real_rss - real_rss_file, 0), centralizes RSS-limit checks, integrates the check into allocation and storage read paths, and advances jemalloc epoch before metrics collection.

Changes

RSS-aware memory tracking and limit enforcement

Layer / File(s) Summary
RSS file-backed metrics data model
dbms/src/Common/MemoryTracker.h, dbms/src/Common/MemoryTracker.cpp
Global atomic real_rss_file is added alongside real_rss to separately track RSS from memory-mapped files.
Process metrics collection from background task
dbms/src/Common/BackgroundTask.cpp
Background memory collection loop now calls both get_process_mem_usage() for RSS and get_process_metrics() for file-backed RSS metrics, updating real_rss and real_rss_file atomics.
Allocation path: rollback, effective_rss, fault checks
dbms/src/Common/MemoryTracker.cpp
Allocation-time logic refactored to use a rollback lambda; fault-injection accuracy checks compute effective_rss = max(real_rss - real_rss_file, 0) and report rss_file/total rss.
RSS limit checking methods and declarations
dbms/src/Common/MemoryTracker.h, dbms/src/Common/MemoryTracker.cpp
New private checkRssLimitImpl(...) centralizes RSS validation and throwing; public MemoryTracker::checkRssLimit() and CurrentMemoryTracker::checkRssLimit() forward the check to the current tracker.
RSS validation in storage read operations
dbms/src/Storages/StorageDisaggregatedColumnar.cpp
Storage layer adds MemoryTracker include and calls CurrentMemoryTracker::checkRssLimit() immediately after proxy block read to enforce RSS limits in I/O paths.
Metrics collection and test settings updates
dbms/src/Interpreters/AsynchronousMetrics.cpp, dbms/src/Interpreters/Settings.h
JEMalloc stats epoch is advanced before metric collection; memory_tracker_accuracy_diff_for_test setting description now references effective RSS (excluding file-backed pages).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

approved, lgtm

Suggested reviewers

  • CalvinNeo
  • JinheLin

Poem

🐰 I hop where pages come and go,
Counting bytes both high and low,
I split the RSS into two,
Track the file-backed and the true,
So memory limits stay in tow!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: excluding rss_file from memory tracking and adding memory limit checks for columnar operations.
Description check ✅ Passed The description includes the required sections with specific problem statements and implementation details, though the commit message section is empty.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: yongman <yming0221@gmail.com>
@yongman yongman marked this pull request as ready for review May 25, 2026 08:04
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 25, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 25, 2026

@yongman I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Common/MemoryTracker.cpp`:
- Line 190: The call to checkRssLimitImpl(...) (e.g., checkRssLimitImpl(/*
require_tracked_growth */ true, will_be, size)) can throw after tracked bytes
were already incremented, so you must roll back the tracked increase on
exception; wrap the checkRssLimitImpl call in a try/catch and on catch subtract
the previously added amount (e.g., call the corresponding decrease/undo method
or subtract from tracked_bytes by size) before rethrowing; apply the same
rollback pattern to the other allocation paths referenced (the block around
lines 235–290) so any early abort always undoes the tracked increment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fd99563b-e41a-4731-b09c-3a308a1afd56

📥 Commits

Reviewing files that changed from the base of the PR and between f0766c2 and a60789f.

📒 Files selected for processing (6)
  • dbms/src/Common/BackgroundTask.cpp
  • dbms/src/Common/MemoryTracker.cpp
  • dbms/src/Common/MemoryTracker.h
  • dbms/src/Interpreters/AsynchronousMetrics.cpp
  • dbms/src/Interpreters/Settings.h
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp

Comment thread dbms/src/Common/MemoryTracker.cpp Outdated
yongman added 3 commits May 25, 2026 16:40
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: yongman <yming0221@gmail.com>
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 25, 2026

@yongman: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-sanitizer-asan dd94f94 link false /test pull-sanitizer-asan

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant