Let the OpenshiftControllerManager service to be stopped#582
Conversation
Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retest |
1 similar comment
|
/retest |
| healthcheckStatus := util.RetryTCPConnection("127.0.0.1", "8445") | ||
| if !healthcheckStatus { | ||
| klog.Fatalf(s.Name(), fmt.Errorf("healthcheck status"), "%s failed to start") | ||
| klog.Errorf(s.Name(), fmt.Errorf("healthcheck status"), "%s failed to start") |
There was a problem hiding this comment.
This log format doesn't look right. Errorf() takes a format string first, which here is passed as the last arg.
There was a problem hiding this comment.
oh right, I was just changing severity, but you're right.
| "assets/core/0000_50_cluster-openshift-controller-manager_00_namespace.yaml", | ||
| }, s.kubeconfig); err != nil { | ||
| klog.Warningf("failed to apply openshift namespaces %v", err) | ||
| klog.Errorf("failed to apply openshift namespaces %v", err) |
There was a problem hiding this comment.
Assuming RetryTCPConnection writes an error to the chan, what's the expected sequence of events? It looks as though the err will be placed in the chan buffer, but ApplyNamespaces will be called. If ApplyNamespaces errors, then the error written to the chan is lost at the return. So the actual reason for the crash will be masked.
Or:
If RetryTCPConnection sends an error to the chan buff, but ApplyNamespaces returns nil, then the 2nd go func is run. The select statement will read from the chan buff and not block since there's a value to read, causing the function to return. The concurrent StartControllerManager() call will continue running however.
Question is: is it safe to assume the StartControllerManager routine will die gracefully and quietly (without spamming logs) if RetryTCPConnection returns an error?
To be clear, it'll eventually die when the runtime handles the error and shuts down. I'm interested in how readable/debugable the shutdown will be. Secondly, resources created/opened by StartControllerManager won't have a chance to be cleaned up gracefully, since the main routine doesn't wait for subroutines to complete.
|
@copejon please see #637 for more details, I've moved work here to that PR and I believe I handled your concerns. We also decided not to change the behaviour of services into soft-failures (@fzdarsky had a stronger opinion on this and didn't want that to be modified). So errors will be Fatalf which forces the process into exit in case of a failure. |
Signed-off-by: Miguel Angel Ajo majopela@redhat.com