Skip to content

Missing Pod Logs Not Ingested by Alloy #3472

@nickv2002

Description

@nickv2002

What's wrong?

We use the k8s-monitoring-helm chart with the in the config pated below, chart version is 1.6.33 when we noticed some pod logs were missing from our Grafana Cloud instance. We were able to get the with kubectl logs as well as by looking for the log file from the host node. Investigation revealed many messages in the Grafana Alloy logs similar to the one below pointing at skipping update of position for a file which does not currently exist for pods that no longer existed. We didn't find any messages about processing logs for the pod that was missing from Grafana Cloud (Loki).

ts=2025-04-29T19:47:35.077367632Z level=info msg="skipping update of position for a file which does not currently exist" component_path=/ component_id=loki.source.file.pod_logs component=tailer path=/var/log/pods/b2b_b2b-api-56896ff857-lsvdh_10cd50b7-9f34-44b9-b60c-8ae24816932a/flyway-migrate/0.log

We updated our Helm chart to point at 1.6.34 and then did a rolling restart of all the Alloy pods (I guess it didn't happen automatically because Helm didn't calculate a difference). This fixed the problem and we were able to see our pod logs in Grafana cloud again.

This kind of shakes our trust in Alloy since it was failing to send logs but not in a visible/alert-able way. It wasn't all Alloy instances that were broken, just on one of dozen or so nodes. We wouldn't have noticed if not for a dev hadn't been asking where their logs were.

Questions:

  1. Any tips for how to detect/avoid this going forward?
  2. Any configuration changes we could make to Alloy to improve things?

Steps to reproduce

Unknown

System information

EKS Kubernetes running docker.io/grafana/alloy:v1.8.1

Software version

docker.io/grafana/alloy:v1.8.1

Configuration

cluster:
  name: qa
externalServices:
  prometheus:
    host: https://prometheus-prod-13-prod-us-east-0.grafana.net
    basicAuth:
      username: <redated>
      password: &grafanaAPIpw <redated>
  loki:
    host: https://logs-prod-006.grafana.net
    basicAuth:
      username: <redated>
      password: *grafanaAPIpw
  tempo:
    host: https://tempo-prod-04-prod-us-east-0.grafana.net
    basicAuth:
      username: <redated>
      password: *grafanaAPIpw
kube-state-metrics:
  metricLabelsAllowlist:
    - pods=[*]
    - namespaces=[*]
metrics:
  cost:
    enabled: false
  kube-state-metrics:
    metricsTuning:
      includeMetrics:
        - kube_pod_container_state_started
        - kube_namespace_labels
        - kube_pod_labels
  autoDiscover:
    metricsTuning:
      includeMetrics:
        - ^csp_.*
        - ^promhttp_metric_handler_.*
        - ^b2b.*
        - ^sendgrid_.*
traces:
  enabled: true

Logs

ts=2025-04-29T19:47:35.077367632Z level=info msg="skipping update of position for a file which does not currently exist" component_path=/ component_id=loki.source.file.pod_logs component=tailer path=/var/log/pods/b2b_b2b-api-56896ff857-lsvdh_10cd50b7-9f34-44b9-b60c-8ae24816932a/flyway-migrate/0.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions