Skip to content

[SPARK-57664][CONNECT] Make dropped pipeline events observable in PipelineEventSender#56742

Open
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:sdp-event-drop-observability
Open

[SPARK-57664][CONNECT] Make dropped pipeline events observable in PipelineEventSender#56742
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:sdp-event-drop-observability

Conversation

@LuciferYang

@LuciferYang LuciferYang commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

PipelineEventSender streams pipeline events to a Connect client through a single background thread backed by a bounded queue (sized by the event-queue capacity conf). When that queue is full it intentionally drops non-terminal FlowProgress and other non-RunProgress events to avoid blocking execution; terminal FlowProgress events and all RunProgress events are always enqueued. Until now those drops were completely silent.

This PR makes them observable:

  • a running count of dropped events, exposed as numDroppedEvents;
  • a warning logged when an event is dropped, throttled to at most once per minute (the first drop always logs) so a persistently full queue does not flood the logs;
  • a summary warning at shutdown reporting the total, so drops that were suppressed by the throttle window are still surfaced.

The throttling follows the approach AsyncEventQueue already uses for the same "queue full, drop events" situation.

Why are the changes needed?

A dropped event is lost progress reporting to the client. Previously a drop produced no log line and no counter, so an operator who noticed gaps in pipeline progress had no signal that events were being discarded, let alone how many.

Does this PR introduce any user-facing change?

No. The counter is internal; the only outward change is the additional WARN log lines emitted when events are dropped.

How was this patch tested?

New unit tests in PipelineEventSenderSuite:

  • a sender that never overflows its queue reports zero dropped events;
  • events dropped at capacity are counted (numDroppedEvents);
  • the per-drop warning is throttled (logged once for two in-window drops) and the shutdown summary is emitted;
  • the warning is re-logged once the throttle interval has elapsed.

The drop scenarios use a CountDownLatch handshake to park the worker deterministically rather than relying on timing. The existing capacity test continues to cover the enqueue/drop routing.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

…elineEventSender

PipelineEventSender drops non-terminal events when its bounded queue is full, previously with no log or counter. Add a dropped-event counter, a throttled warning on drop, and a summary warning at shutdown so the loss is observable.

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, but otherwise looks good - thank you @LuciferYang!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants