Skip to content

Querier error "expanding series: consistency check failed because some blocks were not queried" #4431

@saidben0

Description

@saidben0

We keep seeing this error in our querier micro-service each time we attempt to visualize our cortex metrics in Grafana; Grafana shows the same error as well.

"expanding series: consistency check failed because some blocks were not queried"

Grafana seems to be unable to query the metrics data that was pushed by cortex into Azure block storage. I am able to find the storage blocks, that querier/grafana complains about, in our Azure storage account.

We are deploying cortex 0.6.0 using the helm chart; find below our answers.yaml

store_gateway:
  replicas: 1
  extraArgs:
    log.level: debug
alertmanager:
  replicas: 1
  extraArgs:
    log.level: debug
distributor:
  extraArgs:
    log.level: debug
  resources:
    limits:
      cpu: 1
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 512Mi
tags:
  blocks-storage-memcached: true
ingress:
  enabled: true
  annotations:
    kubernetes.io/ingress.class: nginx
  hosts:
    - host: mycortex.com
      paths:
        - /

nginx:
  config:
    auth_orgs:
      - my-org
    client_max_body_size: 10M

query_frontend:
  config:
    max_send_msg_size: 36777216

config:
  alertmanager:
    external_url: /api/prom/alertmanager
  api:
    prometheus_http_prefix: /prometheus
  auth_enabled: true  
  ruler_storage:
    backend: azure
    azure:
      account_key: REDACTED
      account_name: mystorageacc
      container_name: mycontainer
  alertmanager_storage:
    backend: azure
    azure:
      account_key: REDACTED
      account_name: mystorageacc
      container_name: mycontainer
  blocks_storage:
    backend: azure
    tsdb:
      dir: /data/tsdb
    bucket_store:
      sync_dir: /data/tsdb-sync
    azure:
      account_key: REDACTED
      account_name: mystorageacc
      container_name: mycontainer
  chunk_store:
    chunk_cache_config:
      memcached:
        expiration: 1h
      memcached_client:
        timeout: 1s
  distributor:
    pool:
      health_check_ingesters: true
    shard_by_all_labels: true
  frontend:
    log_queries_longer_than: 10s
  ingester:
    lifecycler:
      final_sleep: 0s
      join_after: 0s
      num_tokens: 512
      ring:
        kvstore:
          consul:
            consistent_reads: true
            host: consul-cortex-headless:8500
            http_client_timeout: 20s
          prefix: collectors/
          store: consul
        replication_factor: 3
    max_transfer_retries: 0
  ingester_client:
    grpc_client_config:
      max_recv_msg_size: 104857600
      max_send_msg_size: 104857600
  limits:
    max_series_per_metric: 200000
    enforce_metric_name: false
    reject_old_samples: true
    reject_old_samples_max_age: 168h
  memberlist:
    join_members: []
  querier:
    active_query_tracker_dir: /data/cortex/querier
    query_ingesters_within: 12h
    store_gateway_addresses: cortex-store-gateway-headless:9095
  query_range:
    align_queries_with_step: true
    cache_results: true
    results_cache:
      cache:
        memcached:
          expiration: 1h
        memcached_client:
          timeout: 1s
    split_queries_by_interval: 24h
  ruler:
    enable_alertmanager_discovery: false
  schema:
    configs: []
  server:
    grpc_listen_port: 9095
    grpc_server_max_concurrent_streams: 1000 
    grpc_server_max_recv_msg_size: 104857600 
    grpc_server_max_send_msg_size: 104857600
    http_listen_port: 8080
  storage:
    engine: blocks
  table_manager:
    retention_deletes_enabled: false
    retention_period: 0s
memcached:
  enabled: true
memcached-index-read:
  enabled: true
memcached-index-write:
  enabled: true
memcached-frontend:
  enabled: true

querier logs

level=warn ts=2021-08-18T17:11:39.988658423Z caller=logging.go:71 traceID=259d69c469ec3ec2 msg="GET /api/prom/api/v1/query_range?end=1629306680&query=kube_pod_container_resource_requests_cpu_cores+%7Bprometheus_from%3D%22v2-ch4-non-prod%22%7D&start=1629285080&step=20 (500) 63.563488ms Response: \"{\\\"status\\\":\\\"error\\\",\\\"errorType\\\":\\\"internal\\\",\\\"error\\\":\\\"expanding series: consistency check failed because some blocks were not queried: 01FDCWEWBWQ1YQPZWMPR03BTM3 01FDCWEFND98TSC29JVJTJ8V4H 01FDCNKEQXS77AD4KKQYCKGEW2 01FDCNJRDCEG0JF11R43XHQ339 01FDCNK53WA5VW164XFZ85BH5K\\\"}\" ws: false; X-Scope-Orgid: my-org; uber-trace-id: 259d69c469ec3ec2:1b45f2f3216f3712:1c59b792da14ca61:0; " 
ts=2021-08-18T17:11:39.9941494Z caller=spanlogger.go:79 org_id=my-org traceID=259d69c469ec3ec2 method=blocksStoreQuerier.selectSorted level=warn msg="unable to get store-gateway clients while retrying to fetch missing blocks" err="no store-gateway instance left after filtering out excluded instances for block 01FDCWEWBWQ1YQPZWMPR03BTM3"

Looks like it might be related to store-gateway; no space left on device when it tries to create the index header

level=warn ts=2021-08-18T17:17:17.61930065Z caller=bucket.go:553 org_id=my-org msg="loading block failed" elapsed=1.937918531s id=01FDBK8GVWQV6B39G1TGXGQDRY err="create index header reader: write index header: 2 errors: copy symbols: write /data/tsdb-sync/my-org/01FDBK8GVWQV6B39G1TGXGQDRY/index-header.tmp: no space left on device; close binary writer for /data/tsdb-sync/my-org/01FDBK8GVWQV6B39G1TGXGQDRY/index-header.tmp: write /data/tsdb-sync/my-org/01FDBK8GVWQV6B39G1TGXGQDRY/index-header.tmp: no space left on device"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions