server: check if there are active nics before network GC#8204
server: check if there are active nics before network GC#8204DaanHoogland merged 3 commits intoapache:4.18from
Conversation
|
@blueorangutan package |
|
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Codecov Report
@@ Coverage Diff @@
## 4.18 #8204 +/- ##
============================================
- Coverage 13.10% 13.09% -0.01%
- Complexity 9121 9124 +3
============================================
Files 2720 2720
Lines 257599 257636 +37
Branches 40158 40165 +7
============================================
+ Hits 33747 33750 +3
- Misses 219588 219620 +32
- Partials 4264 4266 +2
... and 11 files with indirect coverage changes 📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7692 |
|
Code looks good @weizhouapache , one concern about the situation the a VM is being created but does not have any nics yet. Is this the situation you describe in #8200? If GC hits at that time, it might be possible that the network is going to be teared down, while the process creating the VM thinks it can continue. |
This situlation has already been handled.
This PR aims to handle the situation that nics_count is 0 but there are running /migrating/stopping vms. Ths nics_count should not be 0 but we have seens this issue sometimes already. The root cause is still unknown. To prevent the network to be shutdown and VR to be stopped , we should consider the situation that nics_count is wrong as 0. |
How has it?
I understand this part and I approve of this PR because of that. |
|
@blueorangutan test |
|
Is there a reason this needs to stay in draft @weizhouapache ? |
@DaanHoogland
|
Getting into "muggenzift"a area here but, |
discussed with Daan, there is indeed a race condition that start vm when network is being shutdown, maybe there is an issue with it. It needs testing. We have found an issue with my pr that running VRs should not be considered as ACS thread will shutdown the VRs in GC. I will update this PR. |
|
@blueorangutan package |
|
@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7783 |
|
@blueorangutan test |
|
@weizhouapache a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-8355)
|
|
@harikrishna-patnala can you approve now? |
|
@DaanHoogland @harikrishna-patnala @shwstppr Here the steps
prior to this PR: network is shutdown (unexpected as there is a running VM) with this PR, network is not shutdown (expected as there is a running VM) stop the vm, and wait for network GC, network is shutdown (expected as there is not a running VM) |
shwstppr
left a comment
There was a problem hiding this comment.
Code change looks okay for an improvement.
I'm not sure if this is a bug or a root cause for an issue as the only way is to update DB manually.
|
@shwstppr , concerning
see
When a thread starts a VM while another one runs the GC for networks this may happen. We have not reproduced in a lab, but seen it in "the wild". |
@shwstppr @DaanHoogland
|
|
Thanks @weizhouapache @DaanHoogland I've approved the PR. We can merge unless you feel we need more testing. |
|
@blueorangutan package |
|
@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7848 |
|
@blueorangutan test |
|
@weizhouapache a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-8406)
|
|
@weizhouapache @DaanHoogland do we need any additional testing here or is it good to merge? |
@shwstppr |
I will repro @weizhouapache last testing description and merge if ok. |
I reproduced the faulty state this way
merging |
|
great, thanks a lot for the testing @DaanHoogland ! |
* 4.18: server: Initial new vpnuser state (apache#8268) UI: Removed redundant IP Address Column when create Port forwarding rules (apache#8275) UI: Removed ICMP input fields for protocol number from ACL List rules modal (apache#8253) server: check if there are active nics before network GC (apache#8204)
Description
This PR fixes #8200
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?