Skip to content

Conversation

@anshul1886
Copy link

While starting the router, send the user from the callingContext instead of defaulting to System user.

https://issues.apache.org/jira/browse/CLOUDSTACK-9198

To test:

Verify that Virtual Router is not getting deployed in disabled Pod.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anshul1886 could you please create a method for these lines? As it is a duplicate code, a method keeps the code clean and helps in future works (especially if well documented).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GabrielBrascher Extracting those lines in a method will remove the contextual information from them and will impact the understanding in future so it is done this way.

@GabrielBrascher
Copy link
Member

I don't understand how the non-system user prevents the VR's deployment on a disabled Pod.

It seems that neither user or account will be used in the VR start method (line 266). Please correct me if I am wrong.

@anshul1886
Copy link
Author

@GabrielBrascher System User can start VR on disabled Pod.

@GabrielBrascher
Copy link
Member

@anshul1886 I am sorry if I wasn't clear. You are using the com.cloud.network.router.NetworkHelperImpl.startVirtualRouter(DomainRouterVO, User, Account, Map<Param, Object>) method (implemented at line 347) which executes the com.cloud.network.router.NetworkHelperImpl.start(DomainRouterVO, User, Account, Map<Param, Object>, DeploymentPlan) method (line 266) with the user and account (modified by your code). My point is that the user and account parameters are not used by the start method. My suggestion is to remove those unused parameters from start and startVirtualRouter methods.

Are you sure that the change you introduce fixed your problem?

@anshul1886
Copy link
Author

@GabrielBrascher Refer commits 7928963 and 11e1e58 to get info about Calling context.

@rafaelweingartner
Copy link
Member

@anshul1886 I did not understand what you wanted to express presenting those commits, would you care to explain? I noticed the commits you pointed out, are pretty old and do not correspond to the code we have in master branch today.

The Point that @GabrielBrascher touched makes sense to me, have you looked at the piece of code he pointed at?

I looked at the code, the variables you changed are not used. If they should be used, that has to be properly coded, if not, they should be removed.

In details for you:
You changed the method com.cloud.network.router.NetworkHelperImpl.startRouters(RouterDeploymentDefinition), to use the called user as the one returned by the code “_accountMgr.getActiveUser(CallContext.current().getCallingUserId())” and the Account by the one returned by the method“CallContext.current().getCallingAccount();”.

Then, at line 336, it is called the method “com.cloud.network.router.NetworkHelperImpl.startVirtualRouter(DomainRouterVO, User, Account, Map<Param, Object>)” with the aforementioned variables you changed. That method does not use either one use of those variables you changed, it only uses them to execute the method “com.cloud.network.router.NetworkHelperImpl.start(DomainRouterVO, User, Account, Map<Param, Object>, DeploymentPlan)”. Those executions occur at lines 349, 387 or 412, depending on some logic.

After that, the method “com.cloud.network.router.NetworkHelperImpl.start(DomainRouterVO, User, Account, Map<Param, Object>, DeploymentPlan)”, receives its parameters and performs its job. However, it does not use the “User” and “Account” parameters in any of its operations.

At the end, the variables were not being used before your change and they are not being used now. Therefore, I do not see how that simple change can solve a problem.

Do you intend to alter the code of “com.cloud.network.router.NetworkHelperImpl.start(DomainRouterVO, User, Account, Map<Param, Object>, DeploymentPlan)” too ?

@yadvr
Copy link
Member

yadvr commented May 2, 2016

@anshul1886 please rebase against master, thanks
an integration test would be great

@anshul1886
Copy link
Author

@rhtyd There are no conflicts here so I don't understand what will be achieve by rebase?

@bvbharatk
Copy link
Contributor

ACS CI BVT Run

Sumarry:
Build Number 68
Hypervisor xenserver
NetworkType Advanced
Passed=69
Failed=4
Skipped=3

Link to logs Folder (search by build_no): https://www.dropbox.com/sh/yj3wnzbceo9uef2/AAB6u-Iap-xztdm6jHX9SjPja?dl=0

Failed tests:

  • test_vpc_vpn.py
    • ContextSuite context=TestRVPCSite2SiteVpn>:setup Failing since 2 runs
    • ContextSuite context=TestVpcRemoteAccessVpn>:setup Failing since 3 runs
    • ContextSuite context=TestVpcSite2SiteVpn>:setup Failing since 3 runs
  • test_vm_life_cycle.py
    • test_10_attachAndDetach_iso Failing since 2 runs

Skipped tests:
test_vm_nic_adapter_vmxnet3
test_static_role_account_acls
test_deploy_vgpu_enabled_vm

Passed test suits:
test_deploy_vm_with_userdata.py
test_affinity_groups_projects.py
test_portable_publicip.py
test_over_provisioning.py
test_global_settings.py
test_scale_vm.py
test_service_offerings.py
test_routers_iptables_default_policy.py
test_routers.py
test_reset_vm_on_reboot.py
test_snapshots.py
test_deploy_vms_with_varied_deploymentplanners.py
test_login.py
test_list_ids_parameter.py
test_public_ip_range.py
test_multipleips_per_nic.py
test_regions.py
test_affinity_groups.py
test_network_acl.py
test_pvlan.py
test_volumes.py
test_nic.py
test_deploy_vm_root_resize.py
test_resource_detail.py
test_secondary_storage.py
test_disk_offerings.py

…starting the

router, send the user from the callingContext instead of defaulting to System user
@anshul1886
Copy link
Author

@rhtyd , Rebased against laster master.

@cloudmonger
Copy link

ACS CI BVT Run

Sumarry:
Build Number 405
Hypervisor xenserver
NetworkType Advanced
Passed=104
Failed=1
Skipped=7

Link to logs Folder (search by build_no): https://www.dropbox.com/sh/yj3wnzbceo9uef2/AAB6u-Iap-xztdm6jHX9SjPja?dl=0

Failed tests:

  • test_non_contigiousvlan.py

  • test_extendPhysicalNetworkVlan Failed

Skipped tests:
test_01_test_vm_volume_snapshot
test_vm_nic_adapter_vmxnet3
test_static_role_account_acls
test_11_ss_nfs_version_on_ssvm
test_nested_virtualization_vmware
test_3d_gpu_support
test_deploy_vgpu_enabled_vm

Passed test suits:
test_deploy_vm_with_userdata.py
test_affinity_groups_projects.py
test_portable_publicip.py
test_over_provisioning.py
test_global_settings.py
test_scale_vm.py
test_service_offerings.py
test_routers_iptables_default_policy.py
test_loadbalance.py
test_routers.py
test_reset_vm_on_reboot.py
test_deploy_vms_with_varied_deploymentplanners.py
test_network.py
test_router_dns.py
test_login.py
test_deploy_vm_iso.py
test_list_ids_parameter.py
test_public_ip_range.py
test_multipleips_per_nic.py
test_regions.py
test_affinity_groups.py
test_network_acl.py
test_pvlan.py
test_volumes.py
test_nic.py
test_deploy_vm_root_resize.py
test_resource_detail.py
test_secondary_storage.py
test_vm_life_cycle.py
test_routers_network_ops.py
test_disk_offerings.py

@yadvr
Copy link
Member

yadvr commented Mar 1, 2017

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-545

@abhinandanprateek
Copy link
Contributor

@anshul1886 Should not we disallow any VM to be deployed in a disabled Pod, why just the VR ?
cc @rhtyd @karuturi

@anshul1886
Copy link
Author

@abhinandanprateek, That's the behaviour we disallow VM to be deployed in disabled Pod. But Virtual router was getting deployed as it was getting called through system user for which we allow deployment.

@abhinandanprateek
Copy link
Contributor

@anshul1886 I am not sure I thought deployment planner for VR and a regular VM is same. If a regular VM is not allowed in a cluster that is under disabled pod then the VR should also be not allowed. Is there some override somewhere ?

@anshul1886
Copy link
Author

anshul1886 commented Mar 1, 2017

@abhinandanprateek, Yes they are using same deployment planner. But VR was getting created in disabled Pod but not regular VM because for VR deployment system user context was getting set. In planner checks are not done/skipped for system user. So if you observe here fix is to use user from calling context instead of system user. Hope this clears your doubt.

One such example:

if (!isRootAdmin(vmProfile)) {
                List<Long> disabledPods = listDisabledPods(plan.getDataCenterId());
                if (!disabledPods.isEmpty()) {
                    if (s_logger.isDebugEnabled()) {
                        s_logger.debug("Removing from the podId list these pods that are disabled: " + disabledPods);
                    }
                    podsWithCapacity.removeAll(disabledPods);
                }
            }

@abhinandanprateek
Copy link
Contributor

@anshul1886 If you trace the method planDeployment in DeploymentPlanningManagerImpl then you will see that orderCluster is getting invoked at line 500. By that time I think the deployment plan for a VM that has hostid or lasthost id populated is already returned (line 472 and 369). This will allow vm to be started on a disabled pod as well as any VM that is created for a host in a disabled pod to be allowed as host id is pre-set for such VMs. There should be additional checks above to disallow a disabled pod.

@koushik-das
Copy link
Contributor

I think disabled resource related checks are only applicable for regular users, root/system users can go ahead and deploy VMs in disabled resources as well.

@anshul1886
Copy link
Author

anshul1886 commented Mar 1, 2017

@abhinandanprateek #1860 should take care of last host id scenario. Other case of user specifying host, should we not allow deploying VM in disabled Pod? That's a separate issue by the way.

@abhinandanprateek
Copy link
Contributor

@koushik-das @anshul1886 If we allow admin to deploy resources on disabled resources then this fix is fine. Even with the fix a non admin user will be able to restart a VM on a disabled resource, so limited by that. Boils down to definition of disabled.
I had done a fix in past that will prevent deployment on disabled pods even for admin, created a Pr for that now, check this out too: https://github.com/apache/cloudstack/pull/1979/files. I think it just boils down to the definition of disabled.

@anshul1886
Copy link
Author

@abhinandanprateek @koushik-das That change is intentional to allow admin users to deploy VM on disabled. Ticket which shows that it is intentional https://issues.apache.org/jira/browse/CLOUDSTACK-7047.

@abhinandanprateek
Copy link
Contributor

@anshul1886 @koushik-das I think with above info the fix looks good.

LGTM.

@anshul1886 anshul1886 closed this Mar 2, 2017
@anshul1886 anshul1886 reopened this Mar 2, 2017
@GabrielBrascher
Copy link
Member

@anshul1886 I would like to raise the point previously discussed by me and @rafaelweingartner. I think that we should pay attention if the change of user and caller will really do the job. So far I do not see how this PR changes the behavior.

Basically this code changes two parameters in startVirtualRouter [callerUser and caller when calling startVirtualRouter(router, callerUser, caller, routerDeploymentDefinition.getParams())]. However, those parameters are only used inside startVirtualRouter when calling the method start(router, user, caller, params, null).

if (router.getRole() != Role.VIRTUAL_ROUTER || !router.getIsRedundantRouter()) {
            return start(router, user, caller, params, null);
}

The problem is that the method start does not use either the user and the caller parameters in the overridden implementation (the one that you are using).

    protected DomainRouterVO start(DomainRouterVO router, final User user, final Account caller, final Map<Param, Object> params, final DeploymentPlan planToDeploy)
            throws StorageUnavailableException, InsufficientCapacityException, ConcurrentOperationException, ResourceUnavailableException {
        s_logger.debug("Starting router " + router);
        try {
            _itMgr.advanceStart(router.getUuid(), params, planToDeploy, null);
        } catch (final OperationTimedoutException e) {
            throw new ResourceUnavailableException("Starting router " + router + " failed! " + e.toString(), DataCenter.class, router.getDataCenterId());
        }
        if (router.isStopPending()) {
            s_logger.info("Clear the stop pending flag of router " + router.getHostName() + " after start router successfully!");
            router.setStopPending(false);
            router = _routerDao.persist(router);
        }
        // We don't want the failure of VPN Connection affect the status of
        // router, so we try to make connection
        // only after router start successfully
        final Long vpcId = router.getVpcId();
        if (vpcId != null) {
            _s2sVpnMgr.reconnectDisconnectedVpnByVpc(vpcId);
        }
        return _routerDao.findById(router.getId());
}

Sorry, but I can't see how your code alters the behavior as intended. Can you please show that by changing the parameters user and caller you are changing the behavior?

Thanks in advance.

@anshul1886
Copy link
Author

@GabrielBrascher @rafaelweingartner, Please go through deployment planner code. Deployment planner decides the VM placement and virtual router also goes through that. Deployment planners uses allocators and planners. In those planners and allocators you will find the admin account related checks as pointed out in one of my previous comment. My intent to point out those old commits of Calling context introduction was that with that introduction there is calling context is passed around implicit. Here the code changes are making sure that context gets changed to proper user.

Hope this clears your doubt.

@nvazquez
Copy link
Contributor

Hi @rafaelweingartner @anshul1886 @GabrielBrascher,
I've read this PR's comments several times and I think I could understand @anshul1886's point. Please correct me if I'm wrong. The execution of getCallingAccount() is setting the context with the proper account, and I think that's fine, as next methods will use it (e.g. orchestrateStart in VirtualMachineManagerImpl lines 829-831).
I also agree with @rafaelweingartner and @GabrielBrascher that even though the context is being set, variables user and caller on start method (defined on line 266) are not being used. @anshul1886, if no validations are required and the context is already set, don't you think that those unused parameters can be removed?

@anshul1886
Copy link
Author

@nvazquez @rafaelweingartner @GabrielBrascher, That method is called from multiple places. There are many places where we can do these kind of changes. I would prefer to have those kind of changes in PR specific to that so that they are easy to track and test for that specific purpose.

@rafaelweingartner
Copy link
Member

@anshul1886 what method that is called in multiple places are you talking about?

@anshul1886
Copy link
Author

anshul1886 commented Mar 15, 2017

@rafaelweingartner stop start destroy etc all methods related to virtual router code have these parameters. Questions there is why are they not removed in the first place when they CallContext got introduced and removed at some places. That's why I would prefer to have different PR specifically for this. If something breaks then that can be easily tackled without the need of reverting some bug fix which may get lost later.

@rafaelweingartner
Copy link
Member

@anshul1886, this pointing finger thing is not good.

I do not know why people did not do the work as it should have been done before. I was probably not around when that was done. I only asked you to remove those variables because you were touching the code in which they are found. It is not only with you, every time I review a code and there is room for improvements, I always suggest it. I also measure my suggestions, I will never ask something huge; normally I ask/suggest for small and concise improvements such as the removal of unused variables/blocks of codes.

I was probably present in most of the PRs created by @nvazquez, you can see how this type of discussion improved greatly all of the code he had already worked on.

If you do not want to remove something that is not being used is fine. However, I would like a clarification. If the variables you are changing are not used (as you finally admitted), then how can changing them solve the problem you reported on CLOUDSTACK-9198?

@anshul1886
Copy link
Author

anshul1886 commented Apr 14, 2017

@rafaelweingartner Sorry for the late reply. But it seems you have misunderstood me here. I am not pointing finger here. I was just trying to point out that there may be developer concern there to not remove it. Some changes looks quite simple but have potential to cause regressions. That's why I have done this way. I prefer to have code improvements of that type of changes in a separate PR. If anything goes wrong then they can be easily reverted without causing any regression. In other words fix for some bug doesn't get accidentally removed. Regarding fix as I have already pointed out that CallContext call is doing the work.

@yadvr yadvr added this to the 4.11 milestone Dec 10, 2017
@yadvr
Copy link
Member

yadvr commented Dec 17, 2017

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-1401

@yadvr
Copy link
Member

yadvr commented Dec 18, 2017

@blueorangutan test

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1806)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 29581 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr1278-t1806-kvm-centos7.zip
Test completed. Failed tests results shown below:

Test Result Time (s) Test File
test_01_vpc_privategw_acl Failure 66.83 test_privategw_acl.py
test_02_vpc_privategw_static_routes Failure 219.24 test_privategw_acl.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 127.88 test_privategw_acl.py
test_04_rvpc_privategw_static_routes Failure 303.99 test_privategw_acl.py
test_02_create_template_with_checksum_sha1 Error 5.27 test_templates.py
test_03_create_template_with_checksum_sha256 Error 5.22 test_templates.py
test_04_create_template_with_checksum_md5 Error 5.22 test_templates.py
test_01_vpc_remote_access_vpn Failure 66.01 test_vpc_vpn.py
test_hostha_kvm_host_fencing Error 20.75 test_hostha_kvm.py

@yadvr
Copy link
Member

yadvr commented Dec 18, 2017

Tests lgtm, ignoring known failures. @rafaelweingartner @GabrielBrascher @nvazquez are we lgtm on this?

@rafaelweingartner
Copy link
Member

@rhtyd I have a lot of doubts about this PR.
It says

Virtual router gets deployed in disabled Pod

However, I still do not see how the changes it is introducing will make sure that a VR does not get deployed in a disabled Pod. I do not see any check if a Pod is disabled or not when VRs are deployed.

Do you see this in the code?

@yadvr
Copy link
Member

yadvr commented Dec 18, 2017

@rafaelweingartner One thing I do see in the changes is that it uses the calling user/account instead of system user to perform operations. And
did not see code wrt this PR, but I've seen code where a system user can do such things.

@yadvr
Copy link
Member

yadvr commented Dec 19, 2017

Additional discussion and review is requested.

@yadvr yadvr removed this from the 4.11 milestone Dec 19, 2017
@rafaelweingartner
Copy link
Member

@rhtyd sorry. Yesterday I was not able to return to this issue.

That is what I thought first. However, there is something I do not understand here. This is what Gabriel, Nicolas and I tried to explain here.

Take a look at line 336 and 516 where the variables callerUser and caller are used as parameters; if the decision to place or not a VR in a POD that is disabled is based on those variables, I would expect them to be used in startVirtualRouter method, right?

So, if you take a look at startVirtualRouter method, you are going to see that those variables are not used there. They are simply passed as parameters to another method, which is called start. The issue is the following. Variables callerUser and caller are not used in start method either. They are also not sent to any other method.

@yadvr
Copy link
Member

yadvr commented Dec 28, 2017

@anshul-gangwar please address outstanding issues, and questions.

@DaanHoogland
Copy link
Contributor

@GabrielBrascher @rhtyd @nvazquez @weizhouapache @wido Is this still relevant?

@GabrielBrascher
Copy link
Member

@DaanHoogland I think that this one can be closed.

@DaanHoogland
Copy link
Contributor

@anshul1886 please rebase and re-open if this is still relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.