What happened:
I encountered a reproducible issue. Microshift AIO in container (podman) works fine , however whenever i reboot the host machine, then start microshift container podman start microshift and do oc get routes after a while , i get an error error: the server doesn't have a resource type "route" , and i get healthcheck failure errors in router-default container logs
What you expected to happen:
oc get routes should not throw error about resource type does not exists
How to reproduce it (as minimally and precisely as possible):
- Launch MicroShift AOI in container (podman)
oc get route to verify its working
- Reboot the host machine which is having microshift container
- Wait for host and microshift to be up
oc get route <--- This throws error
Anything else we need to know?:
Here are the logs of router-default pod
[root@cd3611b48b4e /]# oc get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-flannel-ds-jbsd2 1/1 Running 0 57s
kube-system openshift-console-deployment-7c8785cc5c-sf6x8 1/1 Running 0 37s
kubevirt-hostpath-provisioner kubevirt-hostpath-provisioner-884lz 1/1 Running 0 75s
openshift-dns dns-default-mxllr 2/2 Running 0 102s
openshift-dns node-resolver-7lbt6 1/1 Running 0 92s
openshift-ingress router-default-584549f645-xx6zp 0/1 Running 1 17s
openshift-service-ca service-ca-7bffb6f6bf-dwsxw 1/1 Running 0 2m5s
[root@cd3611b48b4e /]#
[root@cd3611b48b4e /]# oc describe po router-default-584549f645-xx6zp -n openshift-ingress
Name: router-default-584549f645-xx6zp
Namespace: openshift-ingress
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: cd3611b48b4e/10.88.0.2
Start Time: Tue, 12 Apr 2022 09:26:44 +0000
Labels: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
pod-template-hash=584549f645
Annotations: target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
unsupported.do-not-use.openshift.io/override-liveness-grace-period-seconds: 10
Status: Running
IP: 10.88.0.2
IPs:
IP: 10.88.0.2
Controlled By: ReplicaSet/router-default-584549f645
Containers:
router:
Container ID: cri-o://5dbb8fe62029a505625b27edb07fbe27de4d114e94fbcda96c67bd34bbf20d63
Image: quay.io/openshift/okd-content@sha256:01cfbbfdc11e2cbb8856f31a65c83acc7cfbd1986c1309f58c255840efcc0b64
Image ID: quay.io/openshift/okd-content@sha256:01cfbbfdc11e2cbb8856f31a65c83acc7cfbd1986c1309f58c255840efcc0b64
Ports: 80/TCP, 443/TCP, 1936/TCP
Host Ports: 80/TCP, 443/TCP, 1936/TCP
State: Running
Started: Tue, 12 Apr 2022 09:26:46 +0000
Last State: Terminated
Reason: Error
Message: metrics "msg"="listening on the metrics port failed" "error"="listen tcp 0.0.0.0:1936: bind: address already in use"
I0412 09:26:44.915365 1 metrics.go:155] metrics "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1904632]
goroutine 112 [running]:
github.com/cockroachdb/cmux.(*muxListener).Close(0xc00000f0f8, 0x4723a7, 0x413496)
<autogenerated>:1 +0x32
net/http.(*onceCloseListener).close(...)
/usr/lib/golang/src/net/http/server.go:3395
sync.(*Once).doSlow(0xc00017b450, 0xc0007dbe18)
/usr/lib/golang/src/sync/once.go:68 +0xec
sync.(*Once).Do(...)
/usr/lib/golang/src/sync/once.go:59
net/http.(*onceCloseListener).Close(0xc00017b440, 0xc000336420, 0xc000646000)
/usr/lib/golang/src/net/http/server.go:3391 +0x78
net/http.(*Server).Serve(0xc0002a2380, 0x207d310, 0xc00000f0f8, 0x2042540, 0xc000646000)
/usr/lib/golang/src/net/http/server.go:2981 +0x5f6
github.com/openshift/router/pkg/router/metrics.Listener.Listen.func1(0x2041b60, 0xc0000cdcc0, 0x207d310, 0xc00000f0f8)
/go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:147 +0x72
created by github.com/openshift/router/pkg/router/metrics.Listener.Listen
/go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:143 +0x1b9
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1902fec]
goroutine 114 [running]:
github.com/cockroachdb/cmux.(*cMux).Serve(0xc0000cdd80, 0x0, 0x0)
/go/src/github.com/openshift/router/vendor/github.com/cockroachdb/cmux/cmux.go:124 +0x8c
github.com/openshift/router/pkg/router/metrics.Listener.Listen.func3(0x206ee50, 0xc0000cdd80)
/go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:172 +0x35
created by github.com/openshift/router/pkg/router/metrics.Listener.Listen
/go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:171 +0x417
Exit Code: 2
Started: Tue, 12 Apr 2022 09:26:44 +0000
Finished: Tue, 12 Apr 2022 09:26:44 +0000
Ready: False
Restart Count: 1
Requests:
cpu: 100m
memory: 256Mi
Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://localhost:1936/healthz/ready delay=10s timeout=1s period=10s #success=1 #failure=3
Startup: http-get http://:1936/healthz/ready delay=0s timeout=1s period=1s #success=1 #failure=120
Environment:
STATS_PORT: 1936
ROUTER_SERVICE_NAMESPACE: openshift-ingress
DEFAULT_CERTIFICATE_DIR: /etc/pki/tls/private
DEFAULT_DESTINATION_CA_PATH: /var/run/configmaps/service-ca/service-ca.crt
ROUTER_CIPHERS: TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ROUTER_DISABLE_HTTP2: true
ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK: false
ROUTER_METRICS_TLS_CERT_FILE: /etc/pki/tls/private/tls.crt
ROUTER_METRICS_TLS_KEY_FILE: /etc/pki/tls/private/tls.key
ROUTER_METRICS_TYPE: haproxy
ROUTER_SERVICE_NAME: default
ROUTER_SET_FORWARDED_HEADERS: append
ROUTER_THREADS: 4
SSL_MIN_VERSION: TLSv1.2
ROUTER_SUBDOMAIN: ${name}-${namespace}.apps.127.0.0.1.nip.io
ROUTER_ALLOW_WILDCARD_ROUTES: true
ROUTER_OVERRIDE_HOSTNAME: true
Mounts:
/etc/pki/tls/private from default-certificate (ro)
/var/run/configmaps/service-ca from service-ca-bundle (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7jxjf (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-certificate:
Type: Secret (a volume populated by a Secret)
SecretName: router-certs-default
Optional: false
service-ca-bundle:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: service-ca-bundle
Optional: false
kube-api-access-7jxjf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 29s default-scheduler Successfully assigned openshift-ingress/router-default-584549f645-xx6zp to cd3611b48b4e
Normal Pulled 28s (x2 over 29s) kubelet Container image "quay.io/openshift/okd-content@sha256:01cfbbfdc11e2cbb8856f31a65c83acc7cfbd1986c1309f58c255840efcc0b64" already present on machine
Normal Created 27s (x2 over 29s) kubelet Created container router
Normal Started 27s (x2 over 29s) kubelet Started container router
Warning Unhealthy 18s (x9 over 26s) kubelet Startup probe failed: HTTP probe failed with statuscode: 500
Warning ProbeError 17s (x10 over 26s) kubelet Startup probe error: HTTP probe failed with statuscode: 500
body: [-]backend-http failed: reason withheld
[-]has-synced failed: reason withheld
[+]process-running ok
healthz check failed
[root@cd3611b48b4e /]#
Environment:
- Microshift version (use
microshift version): Microshift-AIO:latest
- Hardware configuration:
- OS (e.g:
cat /etc/os-release): Fedora
- Kernel (e.g.
uname -a): Linux fedora 5.14.10-300.fc35.x86_64 #1 SMP Thu Oct 7 20:48:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Others:
Relevant Logs
What happened:
I encountered a reproducible issue. Microshift AIO in container (podman) works fine , however whenever i reboot the host machine, then start microshift container
podman start microshiftand dooc get routesafter a while , i get an errorerror: the server doesn't have a resource type "route", and i get healthcheck failure errors in router-default container logsWhat you expected to happen:
oc get routesshould not throw error about resource type does not existsHow to reproduce it (as minimally and precisely as possible):
oc get routeto verify its workingoc get route<--- This throws errorAnything else we need to know?:
Here are the logs of
router-defaultpodEnvironment:
microshift version):Microshift-AIO:latestcat /etc/os-release): Fedorauname -a):Linux fedora 5.14.10-300.fc35.x86_64 #1 SMP Thu Oct 7 20:48:44 UTC 2021 x86_64 x86_64 x86_64 GNU/LinuxRelevant Logs