Skip to content

未开启健康检查时客户端偶发崩溃且易现一直超时 #2998

Description

@live4thee

Describe the bug
关闭客户端的健康检查后重启服务端,此时客户端:

  1. 偶现 crash
  2. 非常容易出现 RPC timeout(包括新建 channel 也非常容易一直 timeout)

疑似 #2574 引入。加入以下补丁后恢复正常(可能 leak fd? 待详测):

diff --git a/src/brpc/socket.cpp b/src/brpc/socket.cpp
index 405004dc..1114969c 100644
--- a/src/brpc/socket.cpp
+++ b/src/brpc/socket.cpp
@@ -976,15 +976,15 @@ std::string Socket::OnDescription() const {
 void Socket::HoldHCRelatedRef() {
     if (_health_check_interval_s > 0) {
         _is_hc_related_ref_held = true;
-        AddReference();
     }
+    AddReference();
 }
 
 void Socket::ReleaseHCRelatedReference() {
     if (_health_check_interval_s > 0) {
         _is_hc_related_ref_held = false;
-        Dereference();
     }
+    Dereference();
 }
 
 int Socket::WaitAndReset(int32_t expected_nref) {

To Reproduce
运行 client/server 后,客户端(关闭健康检查)不断发送 RPC,然后重启服务端。

Expected behavior

重启 server 后 RPC 恢复正常。

Versions
OS: Debian12/Rocky 8.x
Compiler: g++ 8.5
brpc: 1.10+
protobuf: any

Additional context/screenshots

Image

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugthe code does not work as expected

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions