replace FAILED deployments with `helm upgrade --install --force` by bacongobbler · Pull Request #3597 · helm/helm

bacongobbler · 2018-03-02T20:45:50Z

When using helm upgrade --install, if the first release fails, Helm will respond with an error saying that it cannot upgrade from an unknown state.

With this feature, helm upgrade --install --force automates the same process as helm delete && helm install --replace. It will mark the previously FAILED release as DELETED, delete any existing resources inside Kubernetes, then replace it as if it was a fresh install. I did not want to make this the default behaviour of helm upgrade --install because this is a destructive operation that deletes resources in Kubernetes, and the operator should opt into and accept this behaviour.

closes #3353
refs discussion in #3437

bacongobbler · 2018-03-02T20:57:05Z

Forward note: I'm a little iffy on marking the previous release as SUPERSEDED.

helgi · 2018-03-02T23:05:20Z

Can this be the default behaviour if there is only 1 release prior and it failed? Seems to be the most common case so far.

Beyond that, I agree with it not being the default behaviour.

thomastaylor312

One small comment for discussion, otherwise I tested this and it is good to go

thomastaylor312 · 2018-03-08T01:50:34Z

 	currentRelease, updatedRelease, err := s.prepareUpdate(req)
 	if err != nil {
+		if req.Force {
+			// Use the --force, Luke.


thomastaylor312 · 2018-03-08T01:57:50Z

+		}
+	}
+
+	oldRelease.Info.Status.Code = release.Status_SUPERSEDED


I think this may deserve a new status. If I was troubleshooting and saw "superseded" I wouldn't know it was force updated. Maybe REPLACED?

Hmm good point. I also think it might be good to just keep this in the FAILED state so others cannot roll back to this release, which SUPERSEDED or REPLACED would allow. The more I think about this, the more I would prefer to retain the existing FAILED state.

Related comment: #3597 (comment)

@adamreese any opinions on this?

I think FAILED would also be a good idea

thomastaylor312 · 2018-03-08T01:58:26Z

+		Chart: &chart.Chart{
+			Metadata: &chart.Metadata{Name: "hello"},
+			Templates: []*chart.Template{
+				{Name: "templates/something", Data: []byte("hello: world")},


I'm kind of sad this text isn't from Star Wars 😢

That can be arranged...

thomastaylor312 · 2018-03-08T01:59:40Z

+
+	compareStoredAndReturnedRelease(t, *rs, *res)
+
+	edesc := "Upgrade complete"


Nit: I think expectedDescription would be clearer here. Not a necessary change though by any means

clearer variable names are always great.

bacongobbler · 2018-03-08T15:50:17Z

Can this be the default behaviour if there is only 1 release prior and it failed? Seems to be the most common case so far.

That is good feedback, thanks @helgi. I'd be a little concerned about the behaviour being inconsistent for users though. In this case, a user can expect helm upgrade --install to "fix" the first failed release, but it will continue to fail on subsequent releases with no feedback on how to fix that. I'd kinda prefer to make it explicitly opt-in as a feature flag, but I'd love to know whether that's important!

Perhaps we can decide at a later date if we should do that based on others' feedback in 2.8.2. How does that sound?

helgi · 2018-03-08T16:01:16Z

Perhaps we can decide at a later date if we should do that based on others' feedback in 2.8.2. How does that sound?

Yeah, I considered the inconsistency and cringed as I wrote that message. The scenario I run into a little too often (for comfort) is engineers throwing together a new helm chart and not running it in minikube but rather putting it directly into the dev CI system (copy pasta basically), which leads to failed first deployment a lot of the time.

The user generally has no idea why it failed and why their future pushes do not fix the issue, and even when they have done the fix before (helm delete --purge) it is forgotten a lot of the time. Basically, first time deploy clumsiness UX issues.

I'm happy with deferring the decision but I did want to bring up the use case

adamreese · 2018-03-08T18:59:14Z

+		return res, err
+	}
+
+	// pre-upgrade hooks


Should this be pre-install?

adamreese · 2018-03-08T18:59:26Z

+		return res, err
+	}
+
+	// post-upgrade hooks


Should this be post-install?

bacongobbler · 2018-03-08T20:07:41Z

k, addressed all comments. New stuff since the last round of reviews:

the old release finishes in state DELETED instead of state SUPERSEDED, modelling what helm install --replace would do
due to the wonky semantics of this feature flag, pre/post delete and install hooks are run, not pre/post upgrade hooks
m0ar Star Wars references in the tests for @thomastaylor312's enjoyment

Should be good for another round of reviews

thomastaylor312

This looks good, and your Star Wars reference is golden

When using `helm upgrade --install`, if the first release fails, Helm will respond with an error saying that it cannot upgrade from an unknown state. With this feature, `helm upgrade --install --force` automates the same process as `helm delete && helm install --replace`. It will mark the previous release as DELETED, delete any existing resources inside Kubernetes, then replace it as if it was a fresh install. It will then mark the FAILED release as SUPERSEDED.

mcfedr · 2018-03-27T20:41:11Z

Am I write in reading this as --force will only ever cause a delete and redeploy when the first deploy has failed?

so if i have a chart deployed, break it, and then upgrade it, it will be fixed, not recreated?

stealthybox · 2018-04-19T17:50:10Z

@bacongobbler I'm also confused regarding this change.

helm upgrade --install

if release v1 failed, release v2 will fail because the operation is considered unsafe
if release v4 succeeded and release v5 failed, release v6 will succeed based off of release v4 ?

helm upgrade --install --force

if release v1 failed, release v2 will succeed because release v1 is deleted first
if release v4 succeeded and release v5 failed, release v6 will succeed based off of release ... ?
does it delete the pre-existing release?
does it cause downtime?

Is the --force flag safe to use all the time, expecting that it only destroys kubernetes resources when the very first release fails?

bacongobbler · 2018-04-19T17:59:52Z

helm update --install with the --force flag automates what one would do to "fix" a failed upgrade. --force is just a helm delete && helm install --replace, and it only kicks in when the release failed to deploy. It only causes downtime if your application would go into a failed state. There's nothing we can do to fix that. --force just attempts to fix it. If your application would normally upgrade gracefully, there's no downtime.

bacongobbler · 2018-04-19T18:02:16Z

and no, it kicks in any time a release fails to upgrade.

stealthybox · 2018-04-19T18:49:22Z

is case 2 accurate?
I'm already lost as to whether helm rolls back release versions on failure

stealthybox · 2018-04-19T18:54:51Z

We're still on Helm v2.7.0 because the current upgrade-over-failure behavior appears to be safer for our use case than deleting a deployment.

Our releases usually fail due to hooks, but our hooks are idempotent Jobs, so it's usually safe and desireable to upgrade right over them with the pre-existing Kubernetes resources.

If I'm understanding the new behavior correctly, I believe it would be possible for a Deployment to be deleted and for no Pods to be available to serve traffic mid-release if a hook failed on the previous release.

bacongobbler · 2018-04-27T23:55:32Z

To answer case 2, helm upgrade --install --force will upgrade as normal, so it'll use v4 to upgrade. However, should v6 fail, that's when the --force flag kicks in and I'm not 100% sure I can recall exactly what happens. It's been a little while so I'll have to look through the code again to answer your question. Feel like looking at it together at KubeCon? :)

stealthybox · 2018-04-28T23:13:57Z

👍 yep, I'm curious
looking forward to it

stealthybox · 2018-05-19T03:35:54Z

If I'm understanding the new behavior correctly, I believe it would be possible for a Deployment to be deleted and for no Pods to be available to serve traffic mid-release if a hook failed on the previous release.

It seems from #3208 (comment) that this might be the case when using upgrade install --force.
This flag is pretty dangerous.

replace FAILED deployments with `helm upgrade --install --force`

Failed helm deployments deployments cannot be upgraded without the --force flag. See: helm/helm#3597

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 2, 2018

bacongobbler mentioned this pull request Mar 5, 2018

fix upgrade of broken install #3437

Closed

thomastaylor312 reviewed Mar 8, 2018

View reviewed changes

adamreese reviewed Mar 8, 2018

View reviewed changes

bacongobbler added this to the 2.8.2 - Bugfix milestone Mar 8, 2018

thomastaylor312 approved these changes Mar 8, 2018

View reviewed changes

adamreese approved these changes Mar 9, 2018

View reviewed changes

bacongobbler merged commit abe958e into helm:master Mar 9, 2018

bacongobbler deleted the upgrade-force-replace branch March 9, 2018 19:38

This was referenced Apr 27, 2018

helm upgrade --install doesn't perform an install/upgrade if the first ever install fails #3353

Closed

helm upgrade --install no longer works #3208

Closed

wknapik mentioned this pull request May 11, 2018

Best practice for installing and/orupgrading a deployed and/or failed release #4004

Closed

ysaakpr mentioned this pull request Aug 7, 2018

jx promote continuously fails after first failure jenkins-x/jx#1441

Closed

jkroepke mentioned this pull request Nov 2, 2018

<release> has no deployed releases databus23/helm-diff#108

Closed

splisson pushed a commit to splisson/helm that referenced this pull request Dec 6, 2018

Merge pull request helm#3597 from bacongobbler/upgrade-force-replace

f07c4e0

replace FAILED deployments with `helm upgrade --install --force`

RaphaelVogel added a commit to gardener/cc-utils that referenced this pull request Jan 23, 2019

Add --force flag to helm install command

037c02b

Failed helm deployments deployments cannot be upgraded without the --force flag. See: helm/helm#3597

hickeyma mentioned this pull request Mar 13, 2019

Helm upgrade not patching deployment #5430

Closed

peterholak mentioned this pull request Apr 14, 2020

app-name has no deployed releases #5595

Closed


		compareStoredAndReturnedRelease(t, rs, res)

		edesc := "Upgrade complete"

Conversation

bacongobbler commented Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bacongobbler commented Mar 2, 2018

Uh oh!

helgi commented Mar 2, 2018

Uh oh!

thomastaylor312 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bacongobbler commented Mar 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

helgi commented Mar 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bacongobbler commented Mar 8, 2018

Uh oh!

thomastaylor312 left a comment

Choose a reason for hiding this comment

Uh oh!

mcfedr commented Mar 27, 2018

Uh oh!

stealthybox commented Apr 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bacongobbler commented Apr 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bacongobbler commented Apr 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stealthybox commented Apr 19, 2018

Uh oh!

stealthybox commented Apr 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bacongobbler commented Apr 27, 2018

Uh oh!

stealthybox commented Apr 28, 2018

Uh oh!

stealthybox commented May 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

bacongobbler commented Mar 2, 2018 •

edited

Loading

bacongobbler commented Mar 8, 2018 •

edited

Loading

stealthybox commented Apr 19, 2018 •

edited

Loading

bacongobbler commented Apr 19, 2018 •

edited

Loading

bacongobbler commented Apr 19, 2018 •

edited

Loading

stealthybox commented Apr 19, 2018 •

edited

Loading