Skip to content

[Enhancement] CN bottleneck in runtime_filter during high concurrency #71986

@kyle-goodale-klaviyo

Description

@kyle-goodale-klaviyo

Enhancement

Under high concurrency we observe the runtime filter worker queue backing up considerably — we've seen runtime_filter_event_queue_len exceed 1.1M events and runtime_filter_bytes_in_queue exceed 60 GiB on a single CN, growing unboundedly until we intervene.

The drain is single-threaded (RuntimeFilterWorker::execute, one std::thread) and the hot event type RECEIVE_TOTAL_RF is dominated by synchronous RPC fan-out. _receive_total_runtime_filter (be/src/runtime/runtime_filter_worker.cpp:957) issues forward RPCs for the broadcast tree, then BatchClosuresJoinAndClean's destructor (line 533) joins every closure via bthread_id_join before the function returns — so the drain thread cannot pull the next event until every forward completes. Stack sample from a stalled worker:

#1  bthread::wait_pthread
#2  bthread::butex_wait
#3  bthread_id_join
#4  BatchClosuresJoinAndClean::~BatchClosuresJoinAndClean
#5  RuntimeFilterWorker::_receive_total_runtime_filter
#6  RuntimeFilterWorker::execute()

The thread was 85% in S state with 23.9M voluntary context switches while 60+ GiB of events sat queued. transmit_runtime_filter_concurrency was 0 during the stall — outbound RPCs complete quickly (p99 = 2.6 ms) but the worker spends its time parked in bthread_id_join between events rather than draining the queue.

There is no lever to parallelize drain or to make the forward step asynchronous. runtime_filter_queue_limit caps queue size but does not address throughput. Our only effective mitigation has been set global enable_global_runtime_filter = false, which we'd like to avoid.

Metadata

Metadata

Assignees

Labels

type/enhancementMake an enhancement to StarRocks

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions