-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Description
In the case where the kube-apiserver isn't available or unstable the controller ends up failing to list secrets, namespaces and configmaps and stops, it doesn't error (enough for the liveness to kick in) or retries.
Example of logs output when it gets in that state:
{"level":"info","timestamp":"2025-02-24T15:53:32Z","msg":"Starting Keess. Running on local cluster: app-beta-gm"}
{"level":"debug","timestamp":"2025-02-24T15:53:32Z","msg":"Namespace polling interval: 60 seconds"}
{"level":"debug","timestamp":"2025-02-24T15:53:32Z","msg":"Polling interval: 60 seconds"}
{"level":"debug","timestamp":"2025-02-24T15:53:32Z","msg":"Housekeeping interval: 60 seconds"}
{"level":"debug","timestamp":"2025-02-24T15:53:32Z","msg":"Log level: debug"}
{"level":"debug","timestamp":"2025-02-24T15:53:32Z","msg":"Kubeconfig path: /root/.kube/config"}
{"level":"info","timestamp":"2025-02-24T15:53:32Z","msg":"Remote clusters: [app-beta-hq app-beta-px app-prod-hq app-prod-gm]"}
{"level":"error","timestamp":"2025-02-24T15:54:02Z","msg":"Failed to list namespaces: Get \"https://10.64.192.1:443/api/v1/namespaces\": dial tcp 10.64.192.1:443: i/o timeout"}
{"level":"error","timestamp":"2025-02-24T15:54:02Z","msg":"Failed to list secrets: Get \"https://10.64.192.1:443/api/v1/secrets?labelSelector=keess.powerhrg.com%2Fmanaged\": dial tcp 10.64.192.1:443: i/o timeout"}
{"level":"error","timestamp":"2025-02-24T15:54:02Z","msg":"Failed to list configMaps: Get \"https://10.64.192.1:443/api/v1/configmaps?labelSelector=keess.powerhrg.com%2Fmanaged\": dial tcp 10.64.192.1:443: i/o timeout"}
{"level":"error","timestamp":"2025-02-24T15:54:02Z","msg":"Failed to list configMaps: Get \"https://10.64.192.1:443/api/v1/configmaps?labelSelector=keess.powerhrg.com%2Fsync\": dial tcp 10.64.192.1:443: i/o timeout"}
{"level":"error","timestamp":"2025-02-24T15:54:02Z","msg":"Failed to list secrets: Get \"https://10.64.192.1:443/api/v1/secrets?labelSelector=keess.powerhrg.com%2Fsync\": dial tcp 10.64.192.1:443: i/o timeout"}
Expected behavior
It would be expected for it to retry or restart if it can't list the resources, so it can try again until the apiserver is available and it can continue without manual intervention.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels