Skip to content

Conversation

@abhinandanprateek
Copy link
Contributor

…on as output

Fix tries to return the output as a single command, instead of appending output from two commands

…on as output

Fix tries to return the output as a single command, instead of appending output from two commands
@ustcweizhou
Copy link
Contributor

@abhinandanprateek I woud say the change is good, but it does not really fix the issue you described.
The issue seems to be caused by missing /var/cache/cloud/cloud-scripts-signature
The file should be created in cloud-early-config. How could it be missed ?

@abhinandanprateek
Copy link
Contributor Author

@ustcweizhou the file exists and the script works on the VR. It is not able to return the complete output though via the router_proxy. The signature file is part of the template.

@abhinandanprateek
Copy link
Contributor Author

abhinandanprateek commented Mar 9, 2017

@ustcweizhou another observation is that with the �same systemvm.iso and VR templates; get version works on some hosts while consistently failing in some cases.

@yadvr
Copy link
Member

yadvr commented Mar 9, 2017

LGTM. Got this tested on a trillian env. I'll fire one more round.
@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@DaanHoogland
Copy link
Contributor

LGTM

@ustcweizhou
Copy link
Contributor

@abhinandanprateek can you spend time on what caused the issue ?
The VR will not be stopped with this PR, it will help troubleshooting the issue, good.
I never saw this issue before actually.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-580

@yadvr
Copy link
Member

yadvr commented Mar 9, 2017

@ustcweizhou the issue we found is due to delay in buffered stdout, where sometimes the getdomr answer may not have the 'hash' after the delimiter '&'. This is a very specific case which normally could not be reproduced. We saw this issue in a production environment and Abhi recently found the issue in a Trillian environment.
@blueorangutan test centos7 vmware-55u3

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-55u3) has been kicked to run smoke tests

@ustcweizhou
Copy link
Contributor

@rhtyd thanks for your explanation.
@abhinandanprateek LGTM

@blueorangutan
Copy link

Trillian test result (tid-941)
Environment: vmware-55u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 41920 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr1995-t941-vmware-55u3.zip
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermitten failure detected: /marvin/tests/smoke/test_snapshots.py
Intermitten failure detected: /marvin/tests/smoke/test_vm_snapshots.py
Test completed. 45 look ok, 3 have error(s)

Test Result Time (s) Test File
test_01_test_vm_volume_snapshot Failure 332.42 test_vm_snapshots.py
test_04_rvpc_privategw_static_routes Failure 908.60 test_privategw_acl.py
test_02_list_snapshots_with_removed_data_store Error 121.21 test_snapshots.py
test_02_list_snapshots_with_removed_data_store Error 126.30 test_snapshots.py
test_01_vpc_site2site_vpn Success 367.29 test_vpc_vpn.py
test_01_vpc_remote_access_vpn Success 171.72 test_vpc_vpn.py
test_01_redundant_vpc_site2site_vpn Success 597.62 test_vpc_vpn.py
test_02_VPC_default_routes Success 404.75 test_vpc_router_nics.py
test_01_VPC_nics_after_destroy Success 711.75 test_vpc_router_nics.py
test_05_rvpc_multi_tiers Success 695.51 test_vpc_redundant.py
test_04_rvpc_network_garbage_collector_nics Success 1543.97 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Success 758.71 test_vpc_redundant.py
test_02_redundant_VPC_default_routes Success 763.69 test_vpc_redundant.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Success 1402.50 test_vpc_redundant.py
test_09_delete_detached_volume Success 26.01 test_volumes.py
test_06_download_detached_volume Success 55.56 test_volumes.py
test_05_detach_volume Success 100.28 test_volumes.py
test_04_delete_attached_volume Success 10.19 test_volumes.py
test_03_download_attached_volume Success 15.28 test_volumes.py
test_02_attach_volume Success 58.91 test_volumes.py
test_01_create_volume Success 516.85 test_volumes.py
test_03_delete_vm_snapshots Success 275.25 test_vm_snapshots.py
test_02_revert_vm_snapshots Success 229.24 test_vm_snapshots.py
test_01_create_vm_snapshots Success 161.80 test_vm_snapshots.py
test_deploy_vm_multiple Success 232.56 test_vm_life_cycle.py
test_deploy_vm Success 0.03 test_vm_life_cycle.py
test_advZoneVirtualRouter Success 0.02 test_vm_life_cycle.py
test_10_attachAndDetach_iso Success 27.13 test_vm_life_cycle.py
test_09_expunge_vm Success 125.24 test_vm_life_cycle.py
test_08_migrate_vm Success 86.38 test_vm_life_cycle.py
test_07_restore_vm Success 0.12 test_vm_life_cycle.py
test_06_destroy_vm Success 10.15 test_vm_life_cycle.py
test_03_reboot_vm Success 5.23 test_vm_life_cycle.py
test_02_start_vm Success 25.23 test_vm_life_cycle.py
test_01_stop_vm Success 10.14 test_vm_life_cycle.py
test_CreateTemplateWithDuplicateName Success 236.56 test_templates.py
test_08_list_system_templates Success 0.03 test_templates.py
test_07_list_public_templates Success 0.04 test_templates.py
test_05_template_permissions Success 0.07 test_templates.py
test_04_extract_template Success 11.75 test_templates.py
test_03_delete_template Success 5.11 test_templates.py
test_02_edit_template Success 90.18 test_templates.py
test_01_create_template Success 115.87 test_templates.py
test_10_destroy_cpvm Success 266.89 test_ssvm.py
test_09_destroy_ssvm Success 268.90 test_ssvm.py
test_08_reboot_cpvm Success 156.68 test_ssvm.py
test_07_reboot_ssvm Success 158.37 test_ssvm.py
test_06_stop_cpvm Success 206.89 test_ssvm.py
test_05_stop_ssvm Success 208.81 test_ssvm.py
test_04_cpvm_internals Success 1.15 test_ssvm.py
test_03_ssvm_internals Success 3.33 test_ssvm.py
test_02_list_cpvm_vm Success 0.11 test_ssvm.py
test_01_list_sec_storage_vm Success 0.13 test_ssvm.py
test_01_snapshot_root_disk Success 66.40 test_snapshots.py
test_04_change_offering_small Success 97.05 test_service_offerings.py
test_03_delete_service_offering Success 0.04 test_service_offerings.py
test_02_edit_service_offering Success 0.09 test_service_offerings.py
test_01_create_service_offering Success 0.11 test_service_offerings.py
test_02_sys_template_ready Success 0.13 test_secondary_storage.py
test_01_sys_vm_start Success 0.17 test_secondary_storage.py
test_09_reboot_router Success 181.51 test_routers.py
test_08_start_router Success 120.78 test_routers.py
test_07_stop_router Success 25.59 test_routers.py
test_06_router_advanced Success 0.06 test_routers.py
test_05_router_basic Success 0.04 test_routers.py
test_04_restart_network_wo_cleanup Success 5.69 test_routers.py
test_03_restart_network_cleanup Success 141.31 test_routers.py
test_02_router_internal_adv Success 0.99 test_routers.py
test_01_router_internal_basic Success 0.57 test_routers.py
test_router_dns_guestipquery Success 76.68 test_router_dns.py
test_router_dns_externalipquery Success 0.11 test_router_dns.py
test_router_dhcphosts Success 159.18 test_router_dhcphosts.py
test_router_dhcp_opts Success 21.65 test_router_dhcphosts.py
test_01_updatevolumedetail Success 0.08 test_resource_detail.py
test_01_reset_vm_on_reboot Success 30.33 test_reset_vm_on_reboot.py
test_createRegion Success 0.04 test_regions.py
test_create_pvlan_network Success 5.25 test_pvlan.py
test_dedicatePublicIpRange Success 0.40 test_public_ip_range.py
test_03_vpc_privategw_restart_vpc_cleanup Success 1054.56 test_privategw_acl.py
test_02_vpc_privategw_static_routes Success 663.29 test_privategw_acl.py
test_01_vpc_privategw_acl Success 192.94 test_privategw_acl.py
test_01_primary_storage_nfs Success 37.64 test_primary_storage.py
test_createPortablePublicIPRange Success 15.22 test_portable_publicip.py
test_createPortablePublicIPAcquire Success 15.51 test_portable_publicip.py
test_isolate_network_password_server Success 94.25 test_password_server.py
test_UpdateStorageOverProvisioningFactor Success 0.14 test_over_provisioning.py
test_oobm_zchange_password Success 30.76 test_outofbandmanagement.py
test_oobm_multiple_mgmt_server_ownership Success 16.55 test_outofbandmanagement.py
test_oobm_issue_power_status Success 10.29 test_outofbandmanagement.py
test_oobm_issue_power_soft Success 15.31 test_outofbandmanagement.py
test_oobm_issue_power_reset Success 15.35 test_outofbandmanagement.py
test_oobm_issue_power_on Success 15.30 test_outofbandmanagement.py
test_oobm_issue_power_off Success 15.30 test_outofbandmanagement.py
test_oobm_issue_power_cycle Success 15.29 test_outofbandmanagement.py
test_oobm_enabledisable_across_clusterzones Success 92.56 test_outofbandmanagement.py
test_oobm_enable_feature_valid Success 5.15 test_outofbandmanagement.py
test_oobm_enable_feature_invalid Success 0.09 test_outofbandmanagement.py
test_oobm_disable_feature_valid Success 5.19 test_outofbandmanagement.py
test_oobm_disable_feature_invalid Success 0.14 test_outofbandmanagement.py
test_oobm_configure_invalid_driver Success 0.07 test_outofbandmanagement.py
test_oobm_configure_default_driver Success 0.07 test_outofbandmanagement.py
test_oobm_background_powerstate_sync Success 23.39 test_outofbandmanagement.py
test_extendPhysicalNetworkVlan Success 15.43 test_non_contigiousvlan.py
test_01_nic Success 475.12 test_nic.py
test_releaseIP Success 283.07 test_network.py
test_reboot_router Success 630.18 test_network.py
test_public_ip_user_account Success 10.31 test_network.py
test_public_ip_admin_account Success 40.25 test_network.py
test_network_rules_acquired_public_ip_3_Load_Balancer_Rule Success 76.88 test_network.py
test_network_rules_acquired_public_ip_2_nat_rule Success 61.69 test_network.py
test_network_rules_acquired_public_ip_1_static_nat_rule Success 125.23 test_network.py
test_delete_account Success 302.82 test_network.py
test_02_port_fwd_on_non_src_nat Success 55.64 test_network.py
test_01_port_fwd_on_src_nat Success 111.77 test_network.py
test_nic_secondaryip_add_remove Success 192.39 test_multipleips_per_nic.py
login_test_saml_user Success 19.11 test_login.py
test_assign_and_removal_lb Success 148.28 test_loadbalance.py
test_02_create_lb_rule_non_nat Success 207.18 test_loadbalance.py
test_01_create_lb_rule_src_nat Success 207.78 test_loadbalance.py
test_03_list_snapshots Success 0.08 test_list_ids_parameter.py
test_02_list_templates Success 0.04 test_list_ids_parameter.py
test_01_list_volumes Success 0.03 test_list_ids_parameter.py
test_07_list_default_iso Success 0.06 test_iso.py
test_05_iso_permissions Success 0.06 test_iso.py
test_04_extract_Iso Success 5.13 test_iso.py
test_03_delete_iso Success 95.18 test_iso.py
test_02_edit_iso Success 0.05 test_iso.py
test_01_create_iso Success 20.97 test_iso.py
test_04_rvpc_internallb_haproxy_stats_on_all_interfaces Success 539.79 test_internal_lb.py
test_03_vpc_internallb_haproxy_stats_on_all_interfaces Success 404.72 test_internal_lb.py
test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80 Success 1047.85 test_internal_lb.py
test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80 Success 819.75 test_internal_lb.py
test_dedicateGuestVlanRange Success 10.30 test_guest_vlan_range.py
test_UpdateConfigParamWithScope Success 0.13 test_global_settings.py
test_rolepermission_lifecycle_update Success 6.17 test_dynamicroles.py
test_rolepermission_lifecycle_list Success 5.99 test_dynamicroles.py
test_rolepermission_lifecycle_delete Success 5.97 test_dynamicroles.py
test_rolepermission_lifecycle_create Success 5.88 test_dynamicroles.py
test_rolepermission_lifecycle_concurrent_updates Success 5.97 test_dynamicroles.py
test_role_lifecycle_update_role_inuse Success 5.89 test_dynamicroles.py
test_role_lifecycle_update Success 10.97 test_dynamicroles.py
test_role_lifecycle_list Success 5.95 test_dynamicroles.py
test_role_lifecycle_delete Success 10.90 test_dynamicroles.py
test_role_lifecycle_create Success 5.89 test_dynamicroles.py
test_role_inuse_deletion Success 5.86 test_dynamicroles.py
test_role_account_acls_multiple_mgmt_servers Success 7.96 test_dynamicroles.py
test_role_account_acls Success 8.23 test_dynamicroles.py
test_default_role_deletion Success 5.95 test_dynamicroles.py
test_04_create_fat_type_disk_offering Success 0.06 test_disk_offerings.py
test_03_delete_disk_offering Success 0.04 test_disk_offerings.py
test_02_edit_disk_offering Success 0.05 test_disk_offerings.py
test_02_create_sparse_type_disk_offering Success 0.06 test_disk_offerings.py
test_01_create_disk_offering Success 0.07 test_disk_offerings.py
test_deployvm_userdispersing Success 55.77 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_userconcentrated Success 96.18 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_firstfit Success 181.44 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_userdata_post Success 55.98 test_deploy_vm_with_userdata.py
test_deployvm_userdata Success 146.38 test_deploy_vm_with_userdata.py
test_02_deploy_vm_root_resize Success 5.83 test_deploy_vm_root_resize.py
test_01_deploy_vm_root_resize Success 6.00 test_deploy_vm_root_resize.py
test_00_deploy_vm_root_resize Success 5.97 test_deploy_vm_root_resize.py
test_deploy_vm_from_iso Success 207.43 test_deploy_vm_iso.py
test_DeployVmAntiAffinityGroup Success 176.74 test_affinity_groups.py
test_08_resize_volume Skipped 5.11 test_volumes.py
test_07_resize_fail Skipped 15.30 test_volumes.py
test_06_copy_template Skipped 0.00 test_templates.py
test_static_role_account_acls Skipped 0.02 test_staticroles.py
test_01_scale_vm Skipped 66.31 test_scale_vm.py
test_01_primary_storage_iscsi Skipped 0.03 test_primary_storage.py
test_06_copy_iso Skipped 0.00 test_iso.py
test_deploy_vgpu_enabled_vm Skipped 0.00 test_deploy_vgpu_enabled_vm.py

@yadvr
Copy link
Member

yadvr commented May 15, 2017

LGTM

@jayapalu
Copy link
Contributor

jayapalu commented May 15, 2017

This issue I have seen many times in my setup. But the problem is I am not able to reproduce the issue. In VR when try to run the script manually every thing goes fine.
The same logic is working from years is now failing. This issue is bit strange to me.

@yadvr
Copy link
Member

yadvr commented May 15, 2017

@jayapalu it's a time/buffer issue and we were able to reproduce this in nested-environment (Trillian) few times.

@karuturi karuturi added this to the 4.10.0.0 milestone May 17, 2017
@karuturi karuturi merged commit 35b7fa3 into apache:4.9 May 17, 2017
@rafaelweingartner
Copy link
Member

@DaanHoogland @rhtyd @abhinandanprateek, even with this PR the problem keeps happening :(
How did you manage to reproduce this error?

In a VMware environment, this problem still happens (sometimes, which is even worse than if it would happen every single time). Interesting enough, after applying this PR and using only our XenServer clusters, the problem does not happen anymore (at least we were not able to reproduce it).

@abhinandanprateek abhinandanprateek changed the title CLOUDSTACK-9828: GetDomRVersionCommand fails to get the correct versi… CLOUDSTACK-9828: GetDomRVersionCmd fails to get the correct versi… May 22, 2017
@abhinandanprateek
Copy link
Contributor Author

@rafaelweingartner it is not easily reproducible. We were seeing in one of our customer env and then in one of our test env (Xenserver). Applying the patch fixed it. See if you can get meaningful info from env where it is failing.

@rafaelweingartner
Copy link
Member

@abhinandanprateek even with this change the error still happened in XenServer hosts.
I noticed the usage of CPU in Dom0 close do 90% when the error happened. I do not know if this can trigger the problem, though.

We changed a little bit the script "router_proxy.sh", and so far we have not seen the error again. We will keep monitoring to check if the error disappears, we will create a PR if everything goes well.

BTW: to get the template version from a VR running in VMware, do we have the same execution flow? I mean, using the "router_proxy.sh" in the ESX host to access the VR?

@abhinandanprateek
Copy link
Contributor Author

abhinandanprateek commented May 24, 2017

@rafaelweingartner The router_proxy route happens for Xen and KVM (similar to the way we manually access the VR, by logging into the host). On the other hand, VMWare VR is accessible for MS using its private ip and as such the scripts can be directly run from MS over ssh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants