Skip to content

bugfix: extend block pool runtime#2965

Merged
wwbmmm merged 1 commit into
apache:masterfrom
yanglimingcn:bugfix/rdma_block_pool
Jul 23, 2025
Merged

bugfix: extend block pool runtime#2965
wwbmmm merged 1 commit into
apache:masterfrom
yanglimingcn:bugfix/rdma_block_pool

Conversation

@yanglimingcn

@yanglimingcn yanglimingcn commented May 7, 2025

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number:

Problem Summary:

rdma block pool runtime ExtendBlockPool has some error:

  1. a region may has many buckets, extend a region not hold all bucket locks, will case race condition, disable runtime extend a region when buckets greater than 1.
  2. when new region created, link the region to idle_list case link list break.
  3. add feature, in scenarios where users need to manually specify memory regions (e.g., using hugepages or custom memory pools), when FLAGS_rdma_memory_pool_user_specified_memory is true, user is responsibility of extending memory blocks , this ensuring flexibility for advanced use cases.

What is changed and the side effects?

Changed:

Side effects:

  • Performance effects:

  • Breaking backward compatibility:


Check List:

@yanglimingcn yanglimingcn changed the title bugfix: block pool dynamic extend bugfix: extend block pool runtime May 7, 2025
@yanglimingcn yanglimingcn force-pushed the bugfix/rdma_block_pool branch 4 times, most recently from 3e1063e to 98de528 Compare May 7, 2025 11:03
@yanglimingcn yanglimingcn requested a review from Copilot May 7, 2025 11:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a bug in the rdma block pool runtime by fixing a race condition during region extension and correcting the idle list linkage.

  • Update for proper linking of new idle nodes in ExtendBlockPool.
  • Addition of locks for all bucket indices during region extension to prevent race conditions.
Comments suppressed due to low confidence (2)

src/brpc/rdma/block_pool.cpp:181

  • Ensure that linking the new idle node via g_info->idle_list correctly preserves the expected ordering for the recycle process; verify that this insertion pattern does not introduce unintended list cycles or ordering issues.
node[i]->next = g_info->idle_list[block_type][i];

src/brpc/rdma/block_pool.cpp:300

  • Review the lock acquisition order for the additional bucket locks to ensure no deadlock may occur when multiple threads execute AllocBlockFrom concurrently; consider documenting the lock ordering to aid future maintenance.
other_locks.emplace_back(*g_info->lock[block_type][i]);

@yanglimingcn yanglimingcn force-pushed the bugfix/rdma_block_pool branch from 98de528 to ae47e7e Compare May 8, 2025 01:12
@yanglimingcn yanglimingcn requested a review from Copilot May 8, 2025 05:14

This comment was marked as outdated.

@yanglimingcn yanglimingcn force-pushed the bugfix/rdma_block_pool branch 5 times, most recently from 1afbf11 to e44b288 Compare May 14, 2025 03:39
@yanglimingcn yanglimingcn force-pushed the bugfix/rdma_block_pool branch from e44b288 to 8b945c3 Compare May 18, 2025 01:44
@yanglimingcn yanglimingcn force-pushed the bugfix/rdma_block_pool branch from 8b945c3 to 04cb111 Compare June 20, 2025 06:38
@yanglimingcn yanglimingcn force-pushed the bugfix/rdma_block_pool branch from 04cb111 to f47e289 Compare June 24, 2025 06:14
@yanglimingcn

Copy link
Copy Markdown
Contributor Author

@wwbmmm @chenBright please review this bugfix

@wwbmmm

wwbmmm commented Jul 18, 2025

Copy link
Copy Markdown
Contributor

LGTM


void* ExtendBlockPoolByUser(void* region_base, size_t region_size,
int block_type) {
if (FLAGS_rdma_memory_pool_user_specified_memory == false) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!FLAGS_rdma_memory_pool_user_specified_memory

@yanglimingcn yanglimingcn Jul 21, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FLAGS_rdma_memory_pool_user_specified_memory == false, means not allowed to ExtentBlockPoolByUser

@chenBright chenBright requested a review from Copilot July 19, 2025 06:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses several bugs in the RDMA block pool runtime extension functionality and introduces a new user-specified memory management feature. The fixes resolve race conditions during region extension, linked list corruption, and add constraints to prevent unsafe multi-bucket scenarios.

  • Fixes race condition and linked list corruption in ExtendBlockPool runtime
  • Adds user-specified memory pool management with FLAGS_rdma_memory_pool_user_specified_memory
  • Changes InitBlockPool return type from void* to bool for cleaner API design

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/brpc/rdma/block_pool.h Updates InitBlockPool signature and adds ExtendBlockPoolByUser declaration with documentation
src/brpc/rdma/block_pool.cpp Core implementation fixes for thread safety, linked list management, and user memory pool support
src/brpc/rdma/rdma_helper.cpp Adds UserExtendBlockPool wrapper function
test/brpc_block_pool_unittest.cpp Updates test assertions to match new InitBlockPool boolean return type

Comment thread src/brpc/rdma/block_pool.h
Comment thread src/brpc/rdma/block_pool.cpp
Comment thread src/brpc/rdma/block_pool.cpp
Comment thread src/brpc/rdma/block_pool.cpp
@wwbmmm wwbmmm merged commit 95b7de3 into apache:master Jul 23, 2025
15 checks passed
@AnDiXL

AnDiXL commented Feb 10, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number:

Problem Summary:

rdma block pool runtime ExtendBlockPool has some error:

  1. a region may has many buckets, extend a region not hold all bucket locks, will case race condition, disable runtime extend a region when buckets greater than 1.
  2. when new region created, link the region to idle_list case link list break.
  3. add feature, in scenarios where users need to manually specify memory regions (e.g., using hugepages or custom memory pools), when FLAGS_rdma_memory_pool_user_specified_memory is true, user is responsibility of extending memory blocks , this ensuring flexibility for advanced use cases.

What is changed and the side effects?

Changed:

Side effects:

  • Performance effects:
  • Breaking backward compatibility:

Check List:

“a region may has many buckets, extend a region not hold all bucket locks, will case race condition, disable runtime extend a region when buckets greater than 1.”

Why not modify it so that a lock is added to the bucket when expanding the region? In this way, there would be no need to limit the number of buckets.
For example, in the following form:

    for (size_t i = 0; i < g_buckets; ++i) {
        node[i]->start = (void*)(region->start + i * (region_size / g_buckets));
        node[i]->len = region_size / g_buckets;

        BAIDU_SCOPED_LOCK(*g_info->lock[block_type][i]);
        node[i]->next = g_info->idle_list[block_type][i];
        g_info->idle_list[block_type][i] = node[i];
        g_info->idle_size[block_type][i] += node[i]->len;
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants