Skip to content

Streaming queries are very inefficient #1195

@bboreham

Description

@bboreham

I noticed high resource usage in ruler and traced it back to a change where I turned on:

   - -querier.batch-iterators=true
   - -querier.ingester-streaming=true

Upon reverting this change, CPU went down to a third of what it was, memory down to a quarter and network traffic to a fifth.

Profiling suggests vast amounts of memory being used here:

github.com/cortexproject/cortex/pkg/querier/batch.newMergeIterator
/go/src/github.com/cortexproject/cortex/pkg/querier/batch/merge.go
  Total:      6.78TB     7.29TB (flat, cum) 38.52%
     20            .          .            
     21            .          .           	currErr error 
     22            .          .           } 
     23            .          .            
     24            .          .           func newMergeIterator(cs []chunk.Chunk) *mergeIterator { 
     25            .   151.63GB           	css := partitionChunks(cs) 
     26       5.39GB     5.39GB           	its := make([]*nonOverlappingIterator, 0, len(css)) 
     27            .          .           	for _, cs := range css { 
     28            .   365.93GB           		its = append(its, newNonOverlappingIterator(cs)) 
     29            .          .           	} 
     30            .          .            
     31            .          .           	c := &mergeIterator{ 
     32            .          .           		its:        its, 
     33      10.82GB    10.82GB           		h:          make(iteratorHeap, 0, len(its)), 
     34       3.36TB     3.36TB           		batches:    make(batchStream, 0, len(its)*2*promchunk.BatchSize), 
     35       3.40TB     3.40TB           		batchesBuf: make(batchStream, 0, len(its)*2*promchunk.BatchSize), 
     36            .          .           	} 
     37            .          .            
     38            .          .           	for _, iter := range c.its { 
     39            .          .           		if iter.Next(1) { 
     40            .          .           			c.h = append(c.h, iter) 

I am unclear why those sizes have *promchunk.BatchSize - they are allocating slices of Batch which are already sized that big.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions