Skip to content

Decommission compactors gracefully#6543

Merged
nadav-govari merged 2 commits into
nadav/merge-feature-split-mergesfrom
nadav/decommission
Jun 25, 2026
Merged

Decommission compactors gracefully#6543
nadav-govari merged 2 commits into
nadav/merge-feature-split-mergesfrom
nadav/decommission

Conversation

@nadav-govari

Copy link
Copy Markdown
Collaborator

Description

Currently, split compactors immediately terminate on shutdown signal. This gives them a 5 minute grace period, like ingesters, to finish their ongoing merges gracefully before terminating.

Once switched to Decommissioning, the compactor hardcodes its available slots to 0, finishes its ongoing merges, and then goes silently into the night.

How was this PR tested?

Unit tests. Cluster tests.

@nadav-govari nadav-govari requested a review from a team as a code owner June 24, 2026 18:42
/// Draining: rejects new tasks (reports zero available slots) while in-flight merges finish.
Decommissioning,
/// All in-flight merges have completed; the supervisor can be torn down.
Finished,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decommissioned

Comment thread quickwit/quickwit-serve/src/lib.rs Outdated
error!("failed to decommission ingester gracefully: {:?}", error);
}
// Let the compactor finish the merges already in flight before tearing down its pipelines.
if let Some(compactor_supervisor) = &compactor_supervisor_opt

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the ingester and compactor, wait_for_x_decommission sends the decommissioning notification and then wait for it happen. Since now we have to two things to wait we need to do:

ingester.notify_decommission().await;
compactor.notify_decommission().await;
ingester.wait_for_decommission().await;
compactor.wait_for_decommission().await;

So the two can happen in parallel. This is useful when those two services run in a single node.

pub struct CompactorSupervisor {
node_id: NodeId,
planner_client: CompactionPlannerServiceClient,
status: CompactorStatus,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think status and status_tx can be merged. I'm going to do that refactor as well in the ingester.

@nadav-govari nadav-govari merged commit d6a3fed into nadav/merge-feature-split-merges Jun 25, 2026
5 checks passed
@nadav-govari nadav-govari deleted the nadav/decommission branch June 25, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants