Service discovery hardening#1796
Conversation
SetMatrix is a simple matrix of sets. Added tests This data structure will be used in following commit to handle transient states where the same key can momentarely be associated to more than a value Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
49f2a16 to
bda77e3
Compare
| return err | ||
| } | ||
|
|
||
| if err = ep.addServiceInfoToCluster(); err != nil { |
There was a problem hiding this comment.
move it back to EnableService (introduce back the sb lock)
| func (sb *sandbox) EnableService() error { | ||
| for _, ep := range sb.getConnectedEndpoints() { | ||
| if ep.enableService(true) { | ||
| if err := ep.addServiceInfoToCluster(); err != nil { |
There was a problem hiding this comment.
Restore Enable service logic (introducing again the sb lock)
0a8bf0c to
94372a6
Compare
mavenugo
left a comment
There was a problem hiding this comment.
Just pushing the initial set of review comments. I have a few more areas to cover.
| // but in any case the deleteServiceInfoToCluster will follow doing the cleanup if needed. | ||
| // In case the deleteServiceInfoToCluster arrives first, this one is happening after the endpoint is | ||
| // removed from the list, in this situation the delete will bail out not finding any data to cleanup | ||
| // and the add will bail out not finding the endpoint on the sandbox. |
There was a problem hiding this comment.
Thanks for the excellent summary of the problem.
|
|
||
| // Service aliases | ||
| aliases []string | ||
| serviceAliases []string |
There was a problem hiding this comment.
its okay to be called aliases since it belongs to the service object.
| ipInfo.extResolver = true | ||
| if ok, _ := sr.ipMap.Contains(ipStr, ipInfo{name: name, extResolver: true}); !ok { | ||
| sr.ipMap.Remove(ipStr, ipInfo{name: name}) | ||
| sr.ipMap.Insert(ipStr, ipInfo{name: name, extResolver: true}) |
There was a problem hiding this comment.
@fcrisciani I dont quite understand this logic. Can you pls explain ?
@sanimej could you PTAL this change ?
There was a problem hiding this comment.
If you look at the original logic what the code does is to fetch the element from the map and set the extResolver to true.
The new logic simply checks if there is already an element with extResolver to true, if there is not then removes the entry that has extResolver to false and inserts one to true.
Looking better the logic is slightly different from the original, in the sense that I insert all the time an entry if it is not there, instead the original logic if the entry is not there does not change it. I will add the check to see if the entry is there with extResolver to false
| ipInfo, ok := elemSet[0].(ipInfo) | ||
| if !ok { | ||
| setStr, b := sr.ipMap.String(ip) | ||
| logrus.Warnf("Something is wrong with the ipInfo set for key %s set:%t %s", ip, b, setStr) |
There was a problem hiding this comment.
a better warning message maybe ?
There was a problem hiding this comment.
suggestion? It will be !ok if the element inserted is not of type ipInfo that is strange and major bug
| // network db notifications) | ||
| // In such cases the resolution will be based on the first element of the set, and can vary | ||
| // during the system stabilitation | ||
| ipInfo, ok := elemSet[0].(ipInfo) |
There was a problem hiding this comment.
Is the check for extResolver intentionally ignored here ?
| nDB.indexes[byNetwork].Insert(fmt.Sprintf("/%s/%s/%s", nid, tname, key), entry) | ||
| nDB.Unlock() | ||
|
|
||
| nDB.broadcaster.Write(makeEvent(opCreate, tname, nid, key, value)) |
There was a problem hiding this comment.
Can you pls explain why it is okay to remove the broadcast event ?
There was a problem hiding this comment.
This is triggering a notification of the handleEpTableEvent for local endpoints. This is not necessary considering that local endpoints got already configured by logic before. Also that can be risky because now the timing with remote events can arrive out of sync due to the fact that remote events will take more time than local dispatched ones
There was a problem hiding this comment.
am not against this change. But I like to understand this a bit more.
broadcaster.Write ultimately calls appropriate Watch routines such as handleEpTableEvent or handleNodeTableEvent or other tables managed by other components (such as overlay driver).
CreateEntry in networkdb is a generic API to create an event of any type. But changing this generic function for a specific case of Endpoint management seems risky without considering how it impacts other events.
What if there is another table (not endpoint), with an event created by one component and consumed not just by remote peers, but also by another local component which networkdb as the mechanism for communication ?
Also, if we remove this call, can you point me to another code path for which handleEpTableEvent is called for local endpoints ?
There was a problem hiding this comment.
Based on the review on all the current usage of CreateEntry and Watch, there is no such case that will break and hence am fine with not having to broadcast this event from networkDB.
94372a6 to
41278c3
Compare
|
Don't see this message on github, so will reply by email.
The reasons are several:
1) Events are happening in parallel, imagine this scenario with 2 workers.
Worker A is allocating IPa, now that container exit so IPa is released.
Before that the notification from network db arrives to Worker B, a new
container on Worker B start using IPa. When the notification from WorkerA
arrives to Worker B without the SetMatrix the entry in the ipMap will be
removed. This will break the resolution for the container that is still
running on Worker B.
2) Events from network DB sometime are replayed for endpoints that got
deleted in the past. I saw events delayed up to 6 min. In this case if the
IP is reused by any other worker, the notification arriving will wipe out
the entry from the ipMap.
For all the previous reason, the SetMatrix will guarantee that a specific
IP is going to be removed when no other endpoint is associated with it.
…On Sat, Jun 10, 2017 at 1:14 PM, Madhu Venugopal ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In network.go
<#1796 (comment)>:
> @@ -97,7 +98,7 @@ type ipInfo struct {
type svcInfo struct {
svcMap map[string][]net.IP
svcIPv6Map map[string][]net.IP
- ipMap map[string]*ipInfo
+ ipMap common.SetMatrix
I dont quite understand the need to have this ipMap structure to container
more than 1 ipInfo ?
Hence I dont see a reason to change it to use SetMatrix.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1796 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABvTieULfqSMaiyLTPRzIJX7RirJOZrYks5sCvkxgaJpZM4N0i7C>
.
|
| // In such cases the resolution will be based on the first element of the set, and can vary | ||
| // during the system stabilitation | ||
| elem, ok := elemSet[0].(ipInfo) | ||
| if !ok { |
There was a problem hiding this comment.
What is the purpose of adding this defensive check ?
ipMap is completely controlled by libnetwork core and what is the purpose of checking for ipInfo type ?
There was a problem hiding this comment.
the SetMatrix is a generic data structure that store potentially any kind of type. This, as far as I know, is the only way to cast back to the original type.
There was a problem hiding this comment.
ok. This is a defensive check and am okay to keep it in and am hoping to never see the Error message ever :)
| elem, ok := elemSet[0].(ipInfo) | ||
| if !ok { | ||
| setStr, b := sr.ipMap.String(ip) | ||
| logrus.Warnf("BUG expected set of ipInfo type for key %s set:%t %s", ip, b, setStr) |
There was a problem hiding this comment.
Warning with a BUG seems odd. It certainly will raise an alarm with the users and as mentioned above, when do we expect such a Bug and if this is a concern, shouldnt this be resolved rather than logging it as a BUG ?
There was a problem hiding this comment.
it has to raise the alarm, It will mean that in that map got inserted a wrong type.
| nDB.indexes[byNetwork].Insert(fmt.Sprintf("/%s/%s/%s", nid, tname, key), entry) | ||
| nDB.Unlock() | ||
|
|
||
| nDB.broadcaster.Write(makeEvent(opCreate, tname, nid, key, value)) |
There was a problem hiding this comment.
am not against this change. But I like to understand this a bit more.
broadcaster.Write ultimately calls appropriate Watch routines such as handleEpTableEvent or handleNodeTableEvent or other tables managed by other components (such as overlay driver).
CreateEntry in networkdb is a generic API to create an event of any type. But changing this generic function for a specific case of Endpoint management seems risky without considering how it impacts other events.
What if there is another table (not endpoint), with an event created by one component and consumed not just by remote peers, but also by another local component which networkdb as the mechanism for communication ?
Also, if we remove this call, can you point me to another code path for which handleEpTableEvent is called for local endpoints ?
| logrus.Debugf("DisableService %s START", sb.containerID) | ||
| for _, ep := range sb.getConnectedEndpoints() { | ||
| if ep.enableService(false) { | ||
| if err := ep.deleteServiceInfoFromCluster(); err != nil { |
There was a problem hiding this comment.
Thinking more about this, Isnt it better to have deleteServiceInfoFromCluster called from DisableService, but remove it from sbLeave in order to be consistent with the way EnableService and DisableService are handled. Also it brings in consistency between Join and Leave. WDYT ?
There was a problem hiding this comment.
That would be great as a future addition, but we have to investigate some corner cases. For example I validated that if you do the docker kill <container> no DisableService is called. This of course will leave stale entries behind.
| type loadBalancerBackend struct { | ||
| ip net.IP | ||
| containerName string | ||
| taskAliases []string |
There was a problem hiding this comment.
It is weird to see taskAliases and containerName carried in loadBalancerBackend, where this structure is used so far for the VIP based IPVS programming. Are we changing this for other purposes now (such as DNS-RR as well) ?
There was a problem hiding this comment.
I can see why we have to carry these additional details. This is required especially so that cleanupServiceBindings can call the lower level SD functions that expects to see containerName and taskAliases. But still, this seems out of place.
There was a problem hiding this comment.
@fcrisciani Please consider picking up this change mavenugo@92820b9
This is a minor rework of your changes that enables us to avoid the need to change this structure and also helps scope this PR for fixing the race issues and minimize the code-reorg for a later activity if there is a need.
bbd63a7 to
f2da09f
Compare
| n.addSvcRecords(name, ip, nil, true) | ||
| for _, alias := range taskaliases { | ||
| n.addSvcRecords(alias, ip, nil, true) | ||
| } |
There was a problem hiding this comment.
I can see how you have removed this here and consolidated all the addSvcRecords under addEndpointNameResolution.
But removing this will result in unmanaged containers in attachable networks losing SD.
Hence, we have to call addEndpointNameResolution from here and addEndpointNameResolution must be changed to also handle unmanaged container scenario.
| n.deleteSvcRecords(name, ip, nil, true) | ||
| for _, alias := range taskaliases { | ||
| n.deleteSvcRecords(alias, ip, nil, true) | ||
| } |
| if err := c.addServiceBinding(ep.svcName, ep.svcID, n.ID(), ep.ID(), ep.Name(), ep.virtualIP, ingressPorts, ep.svcAliases, ep.myAliases, ep.Iface().Address().IP, "addServiceInfoToCluster"); err != nil { | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
I think you should also call addEndpointNameResolution from here for the case of unmanaged container and its name resolution.
| if err := c.rmServiceBinding(ep.svcName, ep.svcID, n.ID(), ep.ID(), ep.Name(), ep.virtualIP, ingressPorts, ep.svcAliases, ep.myAliases, ep.Iface().Address().IP, "deleteServiceInfoFromCluster"); err != nil { | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
I think you should also call deleteEndpointNameResolution from here for the case of unmanaged container and its name resolution.
d6a81a6 to
47842a4
Compare
| } | ||
| } | ||
|
|
||
| if addService && len(vip) != 0 { |
There was a problem hiding this comment.
Its hard for me to judge if addService variable is introduced here just as an optimization to avoid calling addSvcRecords multiple times for the same svcName <-> vip combination ?
or Is there a real problem if we call addSvcRecords multiple times for the same svcName <-> vip combination ?
There was a problem hiding this comment.
no real problem is only an optimization and also more symmetric towards the delSvcRecords that has the rmService (the case of the removal the logic is mandatory of course)
47842a4 to
1a67846
Compare
| c := n.getController() | ||
| agent := c.getAgent() | ||
|
|
||
| name := ep.Name() |
There was a problem hiding this comment.
@mavenugo I guess to handle properly the case of anonymous container this has to be brought up
| if n.ingress { | ||
| ingressPorts = ep.ingressPorts | ||
| } | ||
| name := ep.Name() |
There was a problem hiding this comment.
@mavenugo same here for the delete this name is needed
1a67846 to
f6e2ee8
Compare
changed the ipMap to SetMatrix to allow transient states Compacted the addSvc and deleteSvc into a one single method Updated the datastructure for backends to allow storing all the information needed to cleanup properly during the cleanupServiceBindings Removed the enable/disable Service logic that was racing with sbLeave/sbJoin logic Add some debug logs to track further race conditions Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
f6e2ee8 to
d64e71e
Compare
mavenugo
left a comment
There was a problem hiding this comment.
@fcrisciani Thanks for addressing all the comments and fixing the hard to reproduce race test-cases.
LGTM
Contains Service Discovery hardening fixes via moby/libnetwork#1796 Fixes multiple issues such as moby#32830 Signed-off-by: Madhu Venugopal <madhu@docker.com>
This is a cherry-pick of moby/moby#33634 that brings in moby/libnetwork#1796. Signed-off-by: Madhu Venugopal <madhu@docker.com>
Contains Service Discovery hardening fixes via moby/libnetwork#1796 Fixes multiple issues such as #32830 Signed-off-by: Madhu Venugopal <madhu@docker.com> Upstream-commit: 6868b8e Component: engine
This is a cherry-pick of moby/moby#33634 that brings in moby/libnetwork#1796. Signed-off-by: Madhu Venugopal <madhu@docker.com>
This is a cherry-pick of moby/moby#33634 that brings in moby/libnetwork#1796. Signed-off-by: Madhu Venugopal <madhu@docker.com>
This is a cherry-pick of moby/moby#33634 that brings in moby/libnetwork#1796. Signed-off-by: Madhu Venugopal <madhu@docker.com>
Contains Service Discovery hardening fixes via moby/libnetwork#1796 Fixes multiple issues such as #32830 Signed-off-by: Madhu Venugopal <madhu@docker.com> Upstream-commit: 6868b8e Component: engine
This is a cherry-pick of moby/moby#33634 that brings in moby/libnetwork#1796. Signed-off-by: Madhu Venugopal <madhu@docker.com>
This patch addresses several issues found on the Service Discovery feature.