-
Notifications
You must be signed in to change notification settings - Fork 538
Description
What's wrong?
We use the k8s-monitoring-helm chart with the in the config pated below, chart version is 1.6.33 when we noticed some pod logs were missing from our Grafana Cloud instance. We were able to get the with kubectl logs as well as by looking for the log file from the host node. Investigation revealed many messages in the Grafana Alloy logs similar to the one below pointing at skipping update of position for a file which does not currently exist for pods that no longer existed. We didn't find any messages about processing logs for the pod that was missing from Grafana Cloud (Loki).
ts=2025-04-29T19:47:35.077367632Z level=info msg="skipping update of position for a file which does not currently exist" component_path=/ component_id=loki.source.file.pod_logs component=tailer path=/var/log/pods/b2b_b2b-api-56896ff857-lsvdh_10cd50b7-9f34-44b9-b60c-8ae24816932a/flyway-migrate/0.log
We updated our Helm chart to point at 1.6.34 and then did a rolling restart of all the Alloy pods (I guess it didn't happen automatically because Helm didn't calculate a difference). This fixed the problem and we were able to see our pod logs in Grafana cloud again.
This kind of shakes our trust in Alloy since it was failing to send logs but not in a visible/alert-able way. It wasn't all Alloy instances that were broken, just on one of dozen or so nodes. We wouldn't have noticed if not for a dev hadn't been asking where their logs were.
Questions:
- Any tips for how to detect/avoid this going forward?
- Any configuration changes we could make to Alloy to improve things?
Steps to reproduce
Unknown
System information
EKS Kubernetes running docker.io/grafana/alloy:v1.8.1
Software version
docker.io/grafana/alloy:v1.8.1
Configuration
cluster:
name: qa
externalServices:
prometheus:
host: https://prometheus-prod-13-prod-us-east-0.grafana.net
basicAuth:
username: <redated>
password: &grafanaAPIpw <redated>
loki:
host: https://logs-prod-006.grafana.net
basicAuth:
username: <redated>
password: *grafanaAPIpw
tempo:
host: https://tempo-prod-04-prod-us-east-0.grafana.net
basicAuth:
username: <redated>
password: *grafanaAPIpw
kube-state-metrics:
metricLabelsAllowlist:
- pods=[*]
- namespaces=[*]
metrics:
cost:
enabled: false
kube-state-metrics:
metricsTuning:
includeMetrics:
- kube_pod_container_state_started
- kube_namespace_labels
- kube_pod_labels
autoDiscover:
metricsTuning:
includeMetrics:
- ^csp_.*
- ^promhttp_metric_handler_.*
- ^b2b.*
- ^sendgrid_.*
traces:
enabled: true
Logs
ts=2025-04-29T19:47:35.077367632Z level=info msg="skipping update of position for a file which does not currently exist" component_path=/ component_id=loki.source.file.pod_logs component=tailer path=/var/log/pods/b2b_b2b-api-56896ff857-lsvdh_10cd50b7-9f34-44b9-b60c-8ae24816932a/flyway-migrate/0.log