-
Notifications
You must be signed in to change notification settings - Fork 1.3k
CLOUDSTACK-9198: Virtual router gets deployed in disabled Pod #1278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anshul1886 could you please create a method for these lines? As it is a duplicate code, a method keeps the code clean and helps in future works (especially if well documented).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GabrielBrascher Extracting those lines in a method will remove the contextual information from them and will impact the understanding in future so it is done this way.
|
I don't understand how the non-system user prevents the VR's deployment on a disabled Pod. It seems that neither user or account will be used in the VR start method (line 266). Please correct me if I am wrong. |
|
@GabrielBrascher System User can start VR on disabled Pod. |
|
@anshul1886 I am sorry if I wasn't clear. You are using the com.cloud.network.router.NetworkHelperImpl.startVirtualRouter(DomainRouterVO, User, Account, Map<Param, Object>) method (implemented at line 347) which executes the com.cloud.network.router.NetworkHelperImpl.start(DomainRouterVO, User, Account, Map<Param, Object>, DeploymentPlan) method (line 266) with the user and account (modified by your code). My point is that the user and account parameters are not used by the start method. My suggestion is to remove those unused parameters from start and startVirtualRouter methods. Are you sure that the change you introduce fixed your problem? |
|
@GabrielBrascher Refer commits 7928963 and 11e1e58 to get info about Calling context. |
|
@anshul1886 I did not understand what you wanted to express presenting those commits, would you care to explain? I noticed the commits you pointed out, are pretty old and do not correspond to the code we have in master branch today. The Point that @GabrielBrascher touched makes sense to me, have you looked at the piece of code he pointed at? I looked at the code, the variables you changed are not used. If they should be used, that has to be properly coded, if not, they should be removed. In details for you: Then, at line 336, it is called the method “com.cloud.network.router.NetworkHelperImpl.startVirtualRouter(DomainRouterVO, User, Account, Map<Param, Object>)” with the aforementioned variables you changed. That method does not use either one use of those variables you changed, it only uses them to execute the method “com.cloud.network.router.NetworkHelperImpl.start(DomainRouterVO, User, Account, Map<Param, Object>, DeploymentPlan)”. Those executions occur at lines 349, 387 or 412, depending on some logic. After that, the method “com.cloud.network.router.NetworkHelperImpl.start(DomainRouterVO, User, Account, Map<Param, Object>, DeploymentPlan)”, receives its parameters and performs its job. However, it does not use the “User” and “Account” parameters in any of its operations. At the end, the variables were not being used before your change and they are not being used now. Therefore, I do not see how that simple change can solve a problem. Do you intend to alter the code of “com.cloud.network.router.NetworkHelperImpl.start(DomainRouterVO, User, Account, Map<Param, Object>, DeploymentPlan)” too ? |
|
@anshul1886 please rebase against master, thanks |
|
@rhtyd There are no conflicts here so I don't understand what will be achieve by rebase? |
ACS CI BVT RunSumarry: Link to logs Folder (search by build_no): https://www.dropbox.com/sh/yj3wnzbceo9uef2/AAB6u-Iap-xztdm6jHX9SjPja?dl=0 Failed tests:
Skipped tests: Passed test suits: |
…starting the router, send the user from the callingContext instead of defaulting to System user
228c67d to
d359452
Compare
|
@rhtyd , Rebased against laster master. |
ACS CI BVT RunSumarry: Link to logs Folder (search by build_no): https://www.dropbox.com/sh/yj3wnzbceo9uef2/AAB6u-Iap-xztdm6jHX9SjPja?dl=0 Failed tests:
Skipped tests: Passed test suits: |
|
@blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-545 |
|
@anshul1886 Should not we disallow any VM to be deployed in a disabled Pod, why just the VR ? |
|
@abhinandanprateek, That's the behaviour we disallow VM to be deployed in disabled Pod. But Virtual router was getting deployed as it was getting called through system user for which we allow deployment. |
|
@anshul1886 I am not sure I thought deployment planner for VR and a regular VM is same. If a regular VM is not allowed in a cluster that is under disabled pod then the VR should also be not allowed. Is there some override somewhere ? |
|
@abhinandanprateek, Yes they are using same deployment planner. But VR was getting created in disabled Pod but not regular VM because for VR deployment system user context was getting set. In planner checks are not done/skipped for system user. So if you observe here fix is to use user from calling context instead of system user. Hope this clears your doubt. One such example: if (!isRootAdmin(vmProfile)) {
List<Long> disabledPods = listDisabledPods(plan.getDataCenterId());
if (!disabledPods.isEmpty()) {
if (s_logger.isDebugEnabled()) {
s_logger.debug("Removing from the podId list these pods that are disabled: " + disabledPods);
}
podsWithCapacity.removeAll(disabledPods);
}
} |
|
@anshul1886 If you trace the method planDeployment in DeploymentPlanningManagerImpl then you will see that orderCluster is getting invoked at line 500. By that time I think the deployment plan for a VM that has hostid or lasthost id populated is already returned (line 472 and 369). This will allow vm to be started on a disabled pod as well as any VM that is created for a host in a disabled pod to be allowed as host id is pre-set for such VMs. There should be additional checks above to disallow a disabled pod. |
|
I think disabled resource related checks are only applicable for regular users, root/system users can go ahead and deploy VMs in disabled resources as well. |
|
@abhinandanprateek #1860 should take care of last host id scenario. Other case of user specifying host, should we not allow deploying VM in disabled Pod? That's a separate issue by the way. |
|
@koushik-das @anshul1886 If we allow admin to deploy resources on disabled resources then this fix is fine. Even with the fix a non admin user will be able to restart a VM on a disabled resource, so limited by that. Boils down to definition of disabled. |
|
@abhinandanprateek @koushik-das That change is intentional to allow admin users to deploy VM on disabled. Ticket which shows that it is intentional https://issues.apache.org/jira/browse/CLOUDSTACK-7047. |
|
@anshul1886 @koushik-das I think with above info the fix looks good. LGTM. |
|
@anshul1886 I would like to raise the point previously discussed by me and @rafaelweingartner. I think that we should pay attention if the change of user and caller will really do the job. So far I do not see how this PR changes the behavior. Basically this code changes two parameters in startVirtualRouter [callerUser and caller when calling startVirtualRouter(router, callerUser, caller, routerDeploymentDefinition.getParams())]. However, those parameters are only used inside startVirtualRouter when calling the method start(router, user, caller, params, null). The problem is that the method start does not use either the user and the caller parameters in the overridden implementation (the one that you are using). Sorry, but I can't see how your code alters the behavior as intended. Can you please show that by changing the parameters user and caller you are changing the behavior? Thanks in advance. |
|
@GabrielBrascher @rafaelweingartner, Please go through deployment planner code. Deployment planner decides the VM placement and virtual router also goes through that. Deployment planners uses allocators and planners. In those planners and allocators you will find the admin account related checks as pointed out in one of my previous comment. My intent to point out those old commits of Calling context introduction was that with that introduction there is calling context is passed around implicit. Here the code changes are making sure that context gets changed to proper user. Hope this clears your doubt. |
|
Hi @rafaelweingartner @anshul1886 @GabrielBrascher, |
|
@nvazquez @rafaelweingartner @GabrielBrascher, That method is called from multiple places. There are many places where we can do these kind of changes. I would prefer to have those kind of changes in PR specific to that so that they are easy to track and test for that specific purpose. |
|
@anshul1886 what method that is called in multiple places are you talking about? |
|
@rafaelweingartner stop start destroy etc all methods related to virtual router code have these parameters. Questions there is why are they not removed in the first place when they CallContext got introduced and removed at some places. That's why I would prefer to have different PR specifically for this. If something breaks then that can be easily tackled without the need of reverting some bug fix which may get lost later. |
|
@anshul1886, this pointing finger thing is not good. I do not know why people did not do the work as it should have been done before. I was probably not around when that was done. I only asked you to remove those variables because you were touching the code in which they are found. It is not only with you, every time I review a code and there is room for improvements, I always suggest it. I also measure my suggestions, I will never ask something huge; normally I ask/suggest for small and concise improvements such as the removal of unused variables/blocks of codes. I was probably present in most of the PRs created by @nvazquez, you can see how this type of discussion improved greatly all of the code he had already worked on. If you do not want to remove something that is not being used is fine. However, I would like a clarification. If the variables you are changing are not used (as you finally admitted), then how can changing them solve the problem you reported on CLOUDSTACK-9198? |
|
@rafaelweingartner Sorry for the late reply. But it seems you have misunderstood me here. I am not pointing finger here. I was just trying to point out that there may be developer concern there to not remove it. Some changes looks quite simple but have potential to cause regressions. That's why I have done this way. I prefer to have code improvements of that type of changes in a separate PR. If anything goes wrong then they can be easily reverted without causing any regression. In other words fix for some bug doesn't get accidentally removed. Regarding fix as I have already pointed out that CallContext call is doing the work. |
|
@blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-1401 |
|
@blueorangutan test |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-1806)
|
|
Tests lgtm, ignoring known failures. @rafaelweingartner @GabrielBrascher @nvazquez are we lgtm on this? |
|
@rhtyd I have a lot of doubts about this PR.
However, I still do not see how the changes it is introducing will make sure that a VR does not get deployed in a disabled Pod. I do not see any check if a Pod is disabled or not when VRs are deployed. Do you see this in the code? |
|
@rafaelweingartner One thing I do see in the changes is that it uses the calling user/account instead of system user to perform operations. And |
|
Additional discussion and review is requested. |
|
@rhtyd sorry. Yesterday I was not able to return to this issue. That is what I thought first. However, there is something I do not understand here. This is what Gabriel, Nicolas and I tried to explain here. Take a look at line 336 and 516 where the variables So, if you take a look at |
|
@anshul-gangwar please address outstanding issues, and questions. |
|
@GabrielBrascher @rhtyd @nvazquez @weizhouapache @wido Is this still relevant? |
|
@DaanHoogland I think that this one can be closed. |
|
@anshul1886 please rebase and re-open if this is still relevant |
While starting the router, send the user from the callingContext instead of defaulting to System user.
https://issues.apache.org/jira/browse/CLOUDSTACK-9198
To test:
Verify that Virtual Router is not getting deployed in disabled Pod.