-
Notifications
You must be signed in to change notification settings - Fork 1.3k
CLOUDSTACK-9828: GetDomRVersionCmd fails to get the correct versi… #1995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…on as output Fix tries to return the output as a single command, instead of appending output from two commands
|
@abhinandanprateek I woud say the change is good, but it does not really fix the issue you described. |
|
@ustcweizhou the file exists and the script works on the VR. It is not able to return the complete output though via the router_proxy. The signature file is part of the template. |
|
@ustcweizhou another observation is that with the �same systemvm.iso and VR templates; get version works on some hosts while consistently failing in some cases. |
|
LGTM. Got this tested on a trillian env. I'll fire one more round. |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
LGTM |
|
@abhinandanprateek can you spend time on what caused the issue ? |
|
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-580 |
|
@ustcweizhou the issue we found is due to delay in buffered stdout, where sometimes the getdomr answer may not have the 'hash' after the delimiter '&'. This is a very specific case which normally could not be reproduced. We saw this issue in a production environment and Abhi recently found the issue in a Trillian environment. |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-55u3) has been kicked to run smoke tests |
|
@rhtyd thanks for your explanation. |
|
Trillian test result (tid-941)
|
|
LGTM |
|
This issue I have seen many times in my setup. But the problem is I am not able to reproduce the issue. In VR when try to run the script manually every thing goes fine. |
|
@jayapalu it's a time/buffer issue and we were able to reproduce this in nested-environment (Trillian) few times. |
|
@DaanHoogland @rhtyd @abhinandanprateek, even with this PR the problem keeps happening :( In a VMware environment, this problem still happens (sometimes, which is even worse than if it would happen every single time). Interesting enough, after applying this PR and using only our XenServer clusters, the problem does not happen anymore (at least we were not able to reproduce it). |
|
@rafaelweingartner it is not easily reproducible. We were seeing in one of our customer env and then in one of our test env (Xenserver). Applying the patch fixed it. See if you can get meaningful info from env where it is failing. |
|
@abhinandanprateek even with this change the error still happened in XenServer hosts. We changed a little bit the script "router_proxy.sh", and so far we have not seen the error again. We will keep monitoring to check if the error disappears, we will create a PR if everything goes well. BTW: to get the template version from a VR running in VMware, do we have the same execution flow? I mean, using the "router_proxy.sh" in the ESX host to access the VR? |
|
@rafaelweingartner The router_proxy route happens for Xen and KVM (similar to the way we manually access the VR, by logging into the host). On the other hand, VMWare VR is accessible for MS using its private ip and as such the scripts can be directly run from MS over ssh. |
…on as output
Fix tries to return the output as a single command, instead of appending output from two commands