Skip to content

Flaky test: TestDistributorQuerier_QueryIngestersWithinBoundary due to wall-clock race condition #7415

@CharlieTLe

Description

@CharlieTLe

Problem

TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary is flaky, particularly on slow ARM CI runners.

Example failure: https://github.com/cortexproject/cortex/actions/runs/24256703215/job/70829934477

--- FAIL: TestDistributorQuerier_QueryIngestersWithinBoundary (0.00s)
    --- FAIL: TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (0.00s)
        distributor_queryable_test.go:638: 
            Error Trace:	distributor_queryable_test.go:638
            Error:      	"[]" should have 1 item(s), but has 0
            Test:       	TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary
            Messages:   	should manipulate when maxT is well after boundary

Root Cause

The test captures time.Now() at setup and uses it to compute query boundaries relative to a 1-hour lookback window. However, distributorQuerier.Select() calls time.Now() again internally to compute the ingester query boundary (pkg/querier/distributor_queryable.go:120).

The failing subtest "maxT well after lookback boundary" sets queryMaxT = testNow - 50min. Inside Select, the boundary is computed as realNow - 1h. If realNow has drifted more than 10 seconds past testNow (due to slow test execution on ARM runners), then minT > maxT, the query short-circuits with an empty result, and no distributor call is made.

The 10-second margin in the test case is too tight for slow CI environments.

This test was introduced in #7323.

Possible Solutions

  1. Inject a clock — Have distributorQuerier accept a now function (defaulting to time.Now) so tests can control time.
  2. Increase the margin — Change -lookback + 10*time.Second to a larger value like -lookback + 5*time.Minute to tolerate clock drift.

Option 1 is the more robust fix. Option 2 is a quick mitigation.

Affected Files

  • pkg/querier/distributor_queryable_test.go (test setup at line 606-612)
  • pkg/querier/distributor_queryable.go (wall-clock call at line 120)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions