[fix](load_stream) Fix use-after-free in TabletStream async lambdas#60148
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR fixes a use-after-free bug in TabletStream where async lambdas captured raw this pointers that could become dangling when the TabletStream object is destroyed before async tasks complete (e.g., when thrift connection is broken).
Changes:
- Made TabletStream inherit from
std::enable_shared_from_this<TabletStream>to enable safe shared_ptr capture - Updated async lambdas in
append_data(),add_segment(), and_run_in_heavy_work_pool()to captureshared_from_this()instead of rawthis - Replaced all member access in async lambdas from direct access to
self->member access pattern
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| be/src/runtime/load_stream.h | Added std::enable_shared_from_this<TabletStream> inheritance to TabletStream class |
| be/src/runtime/load_stream.cpp | Updated three async lambda captures in TabletStream methods to use shared_from_this() instead of raw this pointer, ensuring object lifetime safety |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
TPC-H: Total hot run time: 31003 ms |
TPC-DS: Total hot run time: 173013 ms |
The async lambdas in TabletStream captured raw 'this' pointer, which could become dangling when the TabletStream object is destroyed before the async task completes (e.g., when thrift connection is broken). Fix by using shared_from_this() to capture shared_ptr instead of raw pointer, ensuring the object stays alive until all async tasks complete.
ClickBench: Total hot run time: 26.72 s |
d4cc736 to
1cdf449
Compare
|
run buildall |
TPC-H: Total hot run time: 31159 ms |
TPC-DS: Total hot run time: 172530 ms |
ClickBench: Total hot run time: 26.95 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…60148) The async lambdas in TabletStream captured raw 'this' pointer, which could become dangling when the TabletStream object is destroyed before the async task completes (e.g., when thrift connection is broken). Fix by using shared_from_this() to capture shared_ptr instead of raw pointer, ensuring the object stays alive until all async tasks complete.
…h_tasks is called before destruction (#60284) When TabletStream is destroyed without pre_close() being called (e.g., on_idle_timeout scenario), the _flush_token destructor calls shutdown() which triggers deadlock detection if called from the pool thread. Root cause: - on_idle_timeout() directly calls brpc::StreamClose() without calling LoadStream::close() - This triggers the destruction chain without calling pre_close() on TabletStreams - If flush tasks are still running, TabletStream may be destroyed in pool thread Solution: - Add IndexStream::~IndexStream() to ensure wait_for_flush_tasks() is called on all TabletStreams - Add TabletStream::wait_for_flush_tasks() to wait for all flush tasks to complete - This ensures _flush_token is properly handled before TabletStream destruction - Revert #60148 (shared_from_this) as it is no longer needed
…nc lambdas apache#60148 (apache#60177) Cherry-picked from apache#60148 Co-authored-by: Xin Liao <liaoxin@selectdb.com>
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
The async lambdas in TabletStream captured raw 'this' pointer, which could become dangling when the TabletStream object is destroyed before the async task completes (e.g., when thrift connection is broken).
Fix by using shared_from_this() to capture shared_ptr instead of raw pointer, ensuring the object stays alive until all async tasks complete.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)