Skip to content

[BUG] Ingester: TSDB appender leaked on early-return paths in Push (out-of-order label sets etc.) #7529

@sandy2008

Description

@sandy2008

Describe the bug

Ingester.Push (pkg/ingester/ingester.go:1438) acquires a TSDB appender via app := db.Appender(ctx).(extendedAppender) and then performs request validation that can return without ever reaching app.Commit() or app.Rollback(). The clearest leak path is the out-of-order-label-set check at pkg/ingester/ingester.go:1454-1456:

if i.isLabelSetOutOfOrder(tsLabels) {
    i.metrics.oooLabelsTotal.WithLabelValues(userID).Inc()
    return nil, wrapWithUser(errors.Errorf("out-of-order label set found when push: %s", tsLabels), userID)
}

When this branch fires, the appender's TSDB head series references, mmap'd chunks and pending state are never released. Each such request leaks one active appender. Observable via the cortex_ingester_tsdb_head_active_appenders gauge growing monotonically.

The two existing explicit app.Rollback() blocks inside the per-sample / per-histogram failure paths only cover specific append-error returns; they don't protect the OOO check above them, and a future return path added between them and Commit() would have the same problem.

To Reproduce

  1. Run the ingester with at least one tenant that has an active TSDB head.
  2. Send a Push request whose Timeseries[0].Labels are not sorted (the ingester normally relies on the distributor to sort labels, but the check exists as a guard).
  3. Observe cortex_ingester_tsdb_head_active_appenders for that tenant — it increments by 1 per Push and never drops.

Regression test (added in #7528):

go test -tags \"netgo slicelabels\" -run TestIngester_Push_OutOfOrderLabels_AppenderNotLeaked ./pkg/ingester/...

Without the fix, the test pushes 5 OOO-label-set requests and cortex_ingester_tsdb_head_active_appenders reads 5 instead of 0.

Expected behavior

The appender is released (via Rollback()) on every path that returns from Push before Commit() succeeds. The active-appenders gauge returns to 0 after a request is rejected.

Environment

  • Cortex master at 29b6167d6 (and reproducible on later commits).
  • Found via a static audit of CPU / memory / goroutine leaks across pkg/.

Additional context

Fix is being submitted as #7528: defer Rollback() immediately after db.Appender(ctx), flip a committed boolean to true right before app.Commit() (Prometheus's headAppender.Commit closes the appender even on failure, so a second Rollback from the defer must be skipped). The two redundant in-loop Rollback() calls are consolidated under the same defer. Regression test asserts cortex_ingester_tsdb_head_active_appenders == 0 after repeated OOO pushes.

🤖 Reported with help from Claude Code

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions