Allow marking releases stuck in a pending state as failed#16
Conversation
| return rel, nil | ||
| } | ||
|
|
||
| func (r *Reconciler) handlePending(actionClient helmclient.ActionInterface, rel *release.Release, u *updater.Updater, log logr.Logger) (ctrl.Result, error) { |
There was a problem hiding this comment.
Could you add a new test case for a reconciliation with a MarkFailedAfter which tests the transisition from a pending to a failed release (and maybe even the roll-forward)?
As these changes affect the core logic of our reconciliations it is important from my point of view.
| } | ||
| u.UpdateStatus( | ||
| updater.EnsureCondition(conditions.Irreconcilable(corev1.ConditionTrue, conditions.ReasonPendingError, err))) | ||
| return ctrl.Result{}, err |
There was a problem hiding this comment.
Do you think adding an interval to set the RequeueAfter property is useful here?
My fear is that the reconciliations are really fast in production systems and could create bigger loads on the API server which in turn leads to a slower API server.
There was a problem hiding this comment.
Discussed offline: the operator already uses a builtin rate-limiter to prevent overloading the API server. The situation we are handling here should be rare, and we expect that the primary cause are people running manual Helm operations. For these, the pending state will only last for ~10s at most, and we don't want to block the operator for the next 2m if they encounter such a state.
Note: with dependent watches, this might be less of an issue, but we currently don't have those.
| return c.HandleUpgrade() | ||
| } | ||
|
|
||
| func (c *ActionClient) MarkFailed(rel *release.Release, reason string) error { |
There was a problem hiding this comment.
I really like the idea to mark the release as failed and being reconciled by the existing logic. Nice! 💯
Spelled without "e" to not have to reformat surrounding code causing conflicts. Please fix the spelling when upstreaming.
Spelled without "e" to not have to reformat surrounding code causing conflicts. Please fix the spelling when upstreaming.
Spelled without "e" to not have to reformat surrounding code causing conflicts. Please fix the spelling when upstreaming.
Spelled without "e" to not have to reformat surrounding code causing conflicts. Please fix the spelling when upstreaming.
This PR adds an option
WithMarkFailedAfter(duration)which allows marking a release seemingly stuck in a pending state (pending-install,pending-upgrade,pending-rollback) as "failed" after a given timeout (measured from the "last deployed" timestamp). The "failed" state will allow the next "upgrade" operation to succeed.