Skip to content

Freezes if kube-apiserver isn't available at startup #35

@iblackman

Description

@iblackman

Description

In the case where the kube-apiserver isn't available or unstable the controller ends up failing to list secrets, namespaces and configmaps and stops, it doesn't error (enough for the liveness to kick in) or retries.

Example of logs output when it gets in that state:

{"level":"info","timestamp":"2025-02-24T15:53:32Z","msg":"Starting Keess. Running on local cluster: app-beta-gm"}
{"level":"debug","timestamp":"2025-02-24T15:53:32Z","msg":"Namespace polling interval: 60 seconds"}
{"level":"debug","timestamp":"2025-02-24T15:53:32Z","msg":"Polling interval: 60 seconds"}
{"level":"debug","timestamp":"2025-02-24T15:53:32Z","msg":"Housekeeping interval: 60 seconds"}
{"level":"debug","timestamp":"2025-02-24T15:53:32Z","msg":"Log level: debug"}
{"level":"debug","timestamp":"2025-02-24T15:53:32Z","msg":"Kubeconfig path: /root/.kube/config"}
{"level":"info","timestamp":"2025-02-24T15:53:32Z","msg":"Remote clusters: [app-beta-hq app-beta-px app-prod-hq app-prod-gm]"}
{"level":"error","timestamp":"2025-02-24T15:54:02Z","msg":"Failed to list namespaces: Get \"https://10.64.192.1:443/api/v1/namespaces\": dial tcp 10.64.192.1:443: i/o timeout"}
{"level":"error","timestamp":"2025-02-24T15:54:02Z","msg":"Failed to list secrets: Get \"https://10.64.192.1:443/api/v1/secrets?labelSelector=keess.powerhrg.com%2Fmanaged\": dial tcp 10.64.192.1:443: i/o timeout"}
{"level":"error","timestamp":"2025-02-24T15:54:02Z","msg":"Failed to list configMaps: Get \"https://10.64.192.1:443/api/v1/configmaps?labelSelector=keess.powerhrg.com%2Fmanaged\": dial tcp 10.64.192.1:443: i/o timeout"}
{"level":"error","timestamp":"2025-02-24T15:54:02Z","msg":"Failed to list configMaps: Get \"https://10.64.192.1:443/api/v1/configmaps?labelSelector=keess.powerhrg.com%2Fsync\": dial tcp 10.64.192.1:443: i/o timeout"}
{"level":"error","timestamp":"2025-02-24T15:54:02Z","msg":"Failed to list secrets: Get \"https://10.64.192.1:443/api/v1/secrets?labelSelector=keess.powerhrg.com%2Fsync\": dial tcp 10.64.192.1:443: i/o timeout"}

Expected behavior

It would be expected for it to retry or restart if it can't list the resources, so it can try again until the apiserver is available and it can continue without manual intervention.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions