Skip to content
This repository was archived by the owner on Oct 10, 2023. It is now read-only.

MAJOR windows fixes for secretgen and antrea-cleanup#990

Merged
vuil merged 1 commit intomainfrom
topic/hxie/register_antrea_cleanup
Nov 17, 2021
Merged

MAJOR windows fixes for secretgen and antrea-cleanup#990
vuil merged 1 commit intomainfrom
topic/hxie/register_antrea_cleanup

Conversation

@hxietkg
Copy link
Copy Markdown
Contributor

@hxietkg hxietkg commented Oct 27, 2021

several fixes to windows

  • regression for prod clusters fixed by skipping multi AZs
  • fix to antrea start after Windows nodes reboot because OVS bridge and HnsNetwork previously created by Antrea is not cleaned, a cleanup script is provided in the new version of Antrea, this change registers the
    script as a callback on system's shutdown, the cleanup script will be invoked before Windows shutdown.
  • fix to secret-gen does not support Windows cluster, just disable it for Windows

multiple-azs is supported by #1108, but the change breaks Windows cluster, and this feature was not estimated for Windows cluster, so just disable it for Windows .

What this PR does / why we need it

Antrea-agent fails to start after reboot, so the a Windows node will be re-created once it reboots, this is not expected, we need Windows node support reboot.

also, we fix the other issues mentioned above here for expedience

Which issue(s) this PR fixes

Fixes #988

Describe testing done for PR

Create a windows cluster, then reboot Windows node, and verify antrea-agent in the node after reboot.

Release note

windows fixes for reboot, secret gen omission, and multi AZ omission.

PR Checklist

  • Squash the commits into one or a small number of logical commits
  • Use good commit messages
  • Ensure PR contains terms all contributors can understand and links all contributors can access

Additional information

Special notes for your reviewer

@hxietkg hxietkg requested a review from a team as a code owner October 27, 2021 09:23
@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from f2d9cf5 to a9591c5 Compare October 27, 2021 09:29
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211027093614/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211027094147/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@vuil vuil added the ok-to-merge PRs should be labelled with this before merging label Oct 27, 2021
@jayunit100
Copy link
Copy Markdown
Contributor

jayunit100 commented Oct 28, 2021

(deleting earlier comment about ipconfig /renew
seems to work thanks to @stuartpreston for testing for us on his bare metal hardware as well

@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from a9591c5 to 988ea3a Compare October 30, 2021 00:47
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211030005815/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211030030016/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch 2 times, most recently from 0c80479 to 2cdcdc5 Compare October 30, 2021 04:13
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211030042008/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211030042721/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from 2cdcdc5 to 65c8293 Compare October 30, 2021 05:05
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211030051746/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@jayunit100 jayunit100 mentioned this pull request Nov 10, 2021
9 tasks
@jayunit100
Copy link
Copy Markdown
Contributor

before merging lets add the one liner to IS_WINDOWS_WORKLOAD_CLUSTER as well to simplify the merge process /review process.

Lets also rename this to "Minor windows fixes for secretgen and antreacleanup"

@jayunit100
Copy link
Copy Markdown
Contributor

@hxietkg can you rename this to "two minor fixes to windows YTT" and add unit tests to the MR also (if its possible, I think it should be, I think we should be generating the secretgen in the yaml).

@hxietkg hxietkg changed the title Register cleanup for antrea two minor fixes to windows YTT : Register cleanup for antrea & remove secret-gen Nov 10, 2021
@hxietkg hxietkg changed the title two minor fixes to windows YTT : Register cleanup for antrea & remove secret-gen Minor windows fixes for secretgen and antreacleanup Nov 10, 2021
@hxietkg hxietkg changed the title Minor windows fixes for secretgen and antreacleanup Register cleanup for antrea Nov 10, 2021
@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from 65c8293 to 44698fa Compare November 10, 2021 04:35
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211110045110/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@hxietkg
Copy link
Copy Markdown
Contributor Author

hxietkg commented Nov 10, 2021

@hxietkg can you rename this to "two minor fixes to windows YTT" and add unit tests to the MR also (if its possible, I think it should be, I think we should be generating the secretgen in the yaml).

this is not a bug, we may just disable the secret-gen by adding "SECRETGEN_CONTROLLER_ENABLE: false" to config file.

@jayunit100
Copy link
Copy Markdown
Contributor

jayunit100 commented Nov 10, 2021

thats fair - but its still technically a bug on windows side since that SECRETGEN isnt a default disable

... can you add that as a DEFAULT to the windows ytt then ?

@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from 44698fa to 150047a Compare November 10, 2021 16:19
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211110163620/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

filepath.Join(yamlRoot, "ytt", "02_addons", "cpi", "cpi_addon_data.lib.yaml"),
filepath.Join(yamlRoot, "ytt", "03_customizations", "02_avi", "ako-deployment.lib.yaml"),
//filepath.Join(YAML_ROOT, "provider-bundle", "providers", "ytt", "02_addons", "cpi", "cpi_addon_data.lib.yaml"),
filepath.Join(yamlRoot, "ytt", "03_customizations", "03_windows"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes ! thank you for adding this.

#@ load("@ytt:data", "data")
#@ if data.values.IS_WINDOWS_WORKLOAD_CLUSTER:
#@overlay/match by=overlay.subset({"kind":"KubeadmConfigTemplate"})
#@overlay/match by=overlay.subset({"kind":"KubeadmConfigTemplate", "metadata":{"name": str(data.values.CLUSTER_NAME) + "-md-0-windows-containerd"}})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting . I assume this works, but am curious what changed that now requires us to start doing this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH its because we have multiple kubeadmconfig templates ! duh.

@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from db1e3aa to aef7d20 Compare November 15, 2021 00:17
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211115003404/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@hxietkg
Copy link
Copy Markdown
Contributor Author

hxietkg commented Nov 15, 2021

rebased on 15/Nov

@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from ac0fe9e to ea38e56 Compare November 15, 2021 12:09
@jayunit100 jayunit100 changed the title Minor windows fixes for secretgen and antrea-cleanup MAJOR windows fixes for secretgen and antrea-cleanup Nov 15, 2021
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
name: '${ CLUSTER_NAME }-worker'
Copy link
Copy Markdown
Contributor

@jayunit100 jayunit100 Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well need to potentially disable VSPHERE_AZ_1/2 stuff for now, just to get this merged and working for release

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible solutions:

  • disable VSPHERE_AZs
  • enable AZ1/2 in our windows ytt

going w/ one bc...well... time :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuartpreston fYi were disabling this for expedeince

@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from ea38e56 to 2f778ba Compare November 15, 2021 14:59
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211115151504/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from 2f778ba to bc186cb Compare November 16, 2021 02:16
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211116022249/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from bc186cb to bf037c2 Compare November 16, 2021 02:34
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211116023943/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from bf037c2 to 5670be3 Compare November 16, 2021 03:19
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211116032528/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from 5670be3 to e249c88 Compare November 16, 2021 03:34
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211116034029/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

Antrea fails to start after Windows nodes reboot because OVS bridge and
HnsNetwork previously created by Antrea is not cleaned, a cleanup script
is provided in the new version of Antrea, this change registers the
script as a callback on system's shutdown, the cleanup script will be
invoked before Windows shutdown.

secretgen does not support Windows cluster, just disable it.

Disable vsphere-az1/2 on Windows cluster because this is not tested on
Windows clusters.
@hxietkg hxietkg force-pushed the topic/hxie/register_antrea_cleanup branch from e249c88 to 15f4cb2 Compare November 17, 2021 00:16
@github-actions
Copy link
Copy Markdown

Cluster Generation A/B Results:
https://storage.googleapis.com/tkg-clustergen/990/20211117002910/clustergen.diff.txt
Author/reviewers:
Please review to verify that the effects on the generated cluster configurations are exactly what the PR intended, and give a thumbs-up if so.

@vuil
Copy link
Copy Markdown
Contributor

vuil commented Nov 17, 2021

Please add an entry in the release notes section.
Also please update the PR descriptoin since there is more than just introducing the cleanup script in the change now.

@vuil vuil added the kind/bug PR/Issue related to a bug label Nov 17, 2021
@jayunit100
Copy link
Copy Markdown
Contributor

Lgtm

clustergen tests are correctly reporting removal of the old error

“””Error: : unable to get template: Overlaying (in following order: overlay-windows.yaml, vsphere-overlay.yaml, 01_plans/prod.yaml,”””

@jayunit100
Copy link
Copy Markdown
Contributor

Hongsheng please add release note

Copy link
Copy Markdown
Contributor

@vuil vuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@vuil vuil merged commit 4e94999 into main Nov 17, 2021
@vuil vuil deleted the topic/hxie/register_antrea_cleanup branch November 17, 2021 02:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-not-required kind/bug PR/Issue related to a bug ok-to-merge PRs should be labelled with this before merging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows nodes lose connectivity on reboot

4 participants