Skip to content

Let the OpenshiftControllerManager service to be stopped#582

Closed
mangelajo wants to merge 1 commit into
openshift:mainfrom
mangelajo:ocp-controller-manager-shutdown
Closed

Let the OpenshiftControllerManager service to be stopped#582
mangelajo wants to merge 1 commit into
openshift:mainfrom
mangelajo:ocp-controller-manager-shutdown

Conversation

@mangelajo

Copy link
Copy Markdown
Contributor

Signed-off-by: Miguel Angel Ajo majopela@redhat.com

Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
@openshift-ci

openshift-ci Bot commented Jan 27, 2022

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from mangelajo after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested review from husky-parul and oglok January 27, 2022 17:13
@mangelajo

Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@mangelajo

Copy link
Copy Markdown
Contributor Author

/retest

healthcheckStatus := util.RetryTCPConnection("127.0.0.1", "8445")
if !healthcheckStatus {
klog.Fatalf(s.Name(), fmt.Errorf("healthcheck status"), "%s failed to start")
klog.Errorf(s.Name(), fmt.Errorf("healthcheck status"), "%s failed to start")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log format doesn't look right. Errorf() takes a format string first, which here is passed as the last arg.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, I was just changing severity, but you're right.

"assets/core/0000_50_cluster-openshift-controller-manager_00_namespace.yaml",
}, s.kubeconfig); err != nil {
klog.Warningf("failed to apply openshift namespaces %v", err)
klog.Errorf("failed to apply openshift namespaces %v", err)

@copejon copejon Feb 4, 2022

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming RetryTCPConnection writes an error to the chan, what's the expected sequence of events? It looks as though the err will be placed in the chan buffer, but ApplyNamespaces will be called. If ApplyNamespaces errors, then the error written to the chan is lost at the return. So the actual reason for the crash will be masked.

Or:
If RetryTCPConnection sends an error to the chan buff, but ApplyNamespaces returns nil, then the 2nd go func is run. The select statement will read from the chan buff and not block since there's a value to read, causing the function to return. The concurrent StartControllerManager() call will continue running however.

Question is: is it safe to assume the StartControllerManager routine will die gracefully and quietly (without spamming logs) if RetryTCPConnection returns an error?

To be clear, it'll eventually die when the runtime handles the error and shuts down. I'm interested in how readable/debugable the shutdown will be. Secondly, resources created/opened by StartControllerManager won't have a chance to be cleaned up gracefully, since the main routine doesn't wait for subroutines to complete.

@mangelajo mangelajo added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 9, 2022
@mangelajo

Copy link
Copy Markdown
Contributor Author

@copejon please see #637 for more details, I've moved work here to that PR and I believe I handled your concerns.

We also decided not to change the behaviour of services into soft-failures (@fzdarsky had a stronger opinion on this and didn't want that to be modified). So errors will be Fatalf which forces the process into exit in case of a failure.

@mangelajo mangelajo closed this Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants