Allow replacing ingress network#2028
Conversation
|
This will now need a rebase, but wanted the review process to start. |
api/objects.proto
Outdated
| // this information, so that on reboot or leader re-election, new leader | ||
| // will not automatically create the routing mesh if it cannot find one | ||
| // in store. | ||
| bool routingMeshOff = 10; |
There was a problem hiding this comment.
A few comments here:
- protobuf names should use underscores, for example
routing_mesh_off. This gets converted to CamelCase by the protobuf compiler for Go code. - Maybe
routing_mesh_disabledis better? - It's too bad we don't have a message inside
Clusterto store networking options, to organize this and stuff likenetwork_bootstrap_keysandencryption_key_lamport_clock. Maybe it's not too late to start one, though.
There was a problem hiding this comment.
I did not opt for enabled/disabled because this variable indicates a state, which could be just temporary until user creates a custom ingress network, I thought on/off was more appropriate for a state while enabled/disabled is more for a config (which we do not have).
But not being a native speaker, routing_mesh_disabled works just as fine for me. Will change to that.
There was a problem hiding this comment.
I think it may be possible to avoid keeping this state. Right now we create the default cluster object here:
What if we created the ingress network at the same time, but only if the CreateCluster call succeeded? This would ensure that a swarm cluster always gets an ingress network by default, but if it is deleted later on, it would not be recreated automatically.
I'm not sure if this would be sort of hacky. It seems like it would simplify things a bit, but I don't have a strong opinion.
There was a problem hiding this comment.
I am happy to avoid the extra state, but the other options I had before were also sort of hacky, like to add a mock ingress network object to store, to denote the ingress nw need not to be created.
IIUC, the allocator would attempt to create the default cluster object, the failure will indicate this is a leader reelection, or a cluster start after a shutdown. In that case I can
if ErrNoIngress && cannot create cluster object {
routingMeshDisabled == true
}
But what does it mean that allocator succeeds in creating the default cluster object ?
Once allocator gets a chance to run, isn't the default cluster obj already present in store, as created by the manager ? Or is the allcoator cluster object a mock one not seen by anybody else ?
There was a problem hiding this comment.
I rejected adding the mock ingress network, because if user downgrade docker than previous version would not handle the mock network properly.
There was a problem hiding this comment.
I was suggesting moving ingress network creation out of the allocator, to where the initial cluster and node objects are created (see link). It kind of makes sense to do that, since there would be a single place where initial resources are created.
Every time the manager enters the leader state, it tries to create a default cluster object, because this might be a fresh swarm that has never had a leader before. We could also create an ingress network at this time, if creating the default cluster object succeeded (which means this is indeed a fresh swarm).
There was a problem hiding this comment.
Interesting.
Ideally would prefer allocating the networking stuff all in one place, in allocator/network.go.
But what you are suggesting would mean to just push to store the default ingress network object, not yet allocating it.
Let me give it a try.
| // can happen. | ||
| bool attachable = 6; | ||
|
|
||
| // Ingress indicates this network will provide the routing-mesh. |
There was a problem hiding this comment.
Let's mention in the comment that legacy ingress networks won't have this flag set, and instead have the com.docker.swarm.internal label, and the name ingress.
manager/allocator/network.go
Outdated
| clusters, err = store.FindClusters(readTx, store.ByName(store.DefaultClusterName)) | ||
| }); err != nil { | ||
| return errors.Wrapf(err, | ||
| "failed to retrieve cluster object to check routing mesh state during init: %v", err) |
There was a problem hiding this comment.
Remove the %v with the err argument. This will include err twice in the string.
manager/allocator/network.go
Outdated
| // If ingress network is not found, create one right away | ||
| // using the predefined template. | ||
| if len(networks) == 0 { | ||
| nc.clusterID = clusters[0].ID |
There was a problem hiding this comment.
You should check that clusters has a nonzero length. FindClusters will not return an error if it doesn't find any matching clusters.
There should always be a default cluster, so it would be a bug if one didn't exist. But I think it's better to return an error than crash in that case.
There was a problem hiding this comment.
Good to know, I was in fact not sure about this.
I will follow the length check logic I saw in other part of the code.
| t.Status.Timestamp = ptypes.MustTimestampProto(time.Now()) | ||
| } | ||
|
|
||
| func GetIngressNetwork(s *store.MemoryStore) (*api.Network, error) { |
There was a problem hiding this comment.
You'll need a comment to pass lint.
manager/controlapi/network.go
Outdated
| // if you change this function, you have to change createInternalNetwork in | ||
| // the tests to match it (except the part where we check the label). | ||
| if err := validateNetworkSpec(request.Spec, s.pg); err != nil { | ||
| if err := s.validateNetworkRequest(request.Spec, s.pg); err != nil { |
There was a problem hiding this comment.
This should be called inside the s.store.Update call below, to make sure multiple CreateNetwork invocations can't race to create multiple ingress networks.
You might consider changing GetIngressNetwork to take a store.ReadTx instead of a store.
There was a problem hiding this comment.
Ah, good suggestion. I was under the impression we only run best effort check, and a concurrent ingress nw creation will anyway fail to be allocated by the allocator.
If we can protect against the race is better.
manager/allocator/network.go
Outdated
| nc.clusterID = clusters[0].ID | ||
|
|
||
| // Check if we have the ingress network. If not found create | ||
| // it before reading all network objects for allocation. |
There was a problem hiding this comment.
All of the code to check for the ingress network and create it if necessary should be inside a store.Update transaction. Otherwise, something else could create an ingress network through the API at the same time.
There was a problem hiding this comment.
Thanks for the advice. I will see if I can refactor it.
There was a problem hiding this comment.
Actually this code is only executed on the elected leader once the allocator is started (run()).
Allocator is the only component which will check and create the ingress network if needed.
Also, I do not think the system can receive a user request to create an ingress network at this time.
Hopefully we do not need to worry about concurrent creations here.
manager/allocator/network.go
Outdated
| ingressNetwork, err = GetIngressNetwork(a.store) | ||
| if err != nil { | ||
| return errors.Wrap(err, "failed to find ingress network after creating it") | ||
| return errors.Wrapf(err, "failed to find ingress network after creating it: %v", err) |
There was a problem hiding this comment.
%v", err is redundant
manager/controlapi/service.go
Outdated
| if doesServiceNeedIngress(service) { | ||
| if _, err := allocator.GetIngressNetwork(s.store); err != nil { | ||
| if grpc.Code(err) == codes.NotFound { | ||
| return grpc.Errorf(codes.PermissionDenied, "service needs ingress network, but ingress network is not present") |
There was a problem hiding this comment.
I think this should use codes.FailedPrecondition.
manager/controlapi/service.go
Outdated
| if doesServiceNeedIngress(service) { | ||
| if _, err := allocator.GetIngressNetwork(s.store); err != nil { | ||
| if grpc.Code(err) == codes.NotFound { | ||
| return nil, grpc.Errorf(codes.PermissionDenied, "service needs ingress network, but ingress network is not present") |
There was a problem hiding this comment.
I think this should use codes.FailedPrecondition.
manager/controlapi/service.go
Outdated
| } | ||
|
|
||
| func doesServiceNeedIngress(srv *api.Service) bool { | ||
| if srv.Spec.Endpoint.Mode != api.ResolutionModeVirtualIP { |
There was a problem hiding this comment.
Endpoint will be nil if unspecified in the API request, so be sure to check before dereferencing it.
There was a problem hiding this comment.
I have not hit this problem during testing. Probably the cli client fills in a default enpoint object, as it defaults to VIP mode. Here the right thing to do is to check for nil. Thanks.
manager/allocator/network.go
Outdated
| // network only if it is not already done. | ||
| if isIngressNetworkNeeded(s) { | ||
| if nc.ingressNetwork == nil { | ||
| return fmt.Errorf("Ingress network is missing") |
manager/allocator/network.go
Outdated
| return n, nil | ||
| } | ||
| } | ||
| return nil, grpc.Errorf(codes.NotFound, "no ingress network found") |
There was a problem hiding this comment.
I think it's better not to return a gRPC error here, because the allocator doesn't use gRPC at all. Maybe define an exported error and return that.
var ErrNoIngressNetwork = errors.New("no ingress network found")Then callers can easily check if this error was returned, and translate to a gRPC error if appropriate.
There was a problem hiding this comment.
It's because I have the controlapi functions which validate the service and network specs call this method.
There was a problem hiding this comment.
Got it, I will export an error for that.
Codecov Report
@@ Coverage Diff @@
## master #2028 +/- ##
==========================================
- Coverage 53.93% 53.82% -0.12%
==========================================
Files 109 109
Lines 19100 19187 +87
==========================================
+ Hits 10302 10327 +25
- Misses 7561 7608 +47
- Partials 1237 1252 +15Continue to review full report at Codecov.
|
|
@aaronlehmann Updated. Tested reelection with the new logic you suggested, it works fine. |
manager/controlapi/network.go
Outdated
| err := s.store.Update(func(tx store.Tx) error { | ||
| if request.Spec.Ingress { | ||
| if n, err := allocator.GetIngressNetwork(s.store); err == nil { | ||
| return grpc.Errorf(codes.PermissionDenied, "ingress network (%s) is already present", n.ID) |
manager/controlapi/network.go
Outdated
| } | ||
| for _, srv := range services { | ||
| if doesServiceNeedIngress(srv) { | ||
| return grpc.Errorf(codes.PermissionDenied, "ingress network cannot be removed because service %s depends on it", srv.ID) |
There was a problem hiding this comment.
codes.FailedPrecondition
|
Design LGTM |
manager/allocator/network.go
Outdated
| } | ||
| } | ||
|
|
||
| skipIngressNetworkAllocation: |
There was a problem hiding this comment.
I think this label confusing because the flow from line 89 nc.ingressNetwork = ingressNetwork could also reach here.
There was a problem hiding this comment.
Yes, code from line 89 is expected to reach here.
Let me see if modifying the if blocks into a switch makes it more clear, so that I will avoid using the goto.
Signed-off-by: Alessandro Boch <aboch@docker.com>
|
Thanks @dongluochen . I reworked the code to avoid the goto, via a switch, and added some comments. |
| } | ||
|
|
||
| err := s.store.Update(func(tx store.Tx) error { | ||
| if request.Spec.Ingress { |
There was a problem hiding this comment.
nit: I think it better to move this check into store.CreateNetwork(tx, n) where it already checks name duplicate.
There was a problem hiding this comment.
Not sure if it is appropriate to percolate the ingress notion down into the store.createNetwork().
It is true it is doing now a check on duplicated network name, though, but that is more generic, top level resource name.
If you guys think it makes sense, I can move it there now.
There was a problem hiding this comment.
I think either is fine.
|
LGTM |
| } | ||
| } | ||
| return false | ||
| } |
There was a problem hiding this comment.
@aboch: Sorry for the post-merge comment. I was just looking at this again, and I noticed that this function is similar (but not the same as) isIngressNetworkNeeded in the allocator. Are the differences correct?
Would it be possible to combine these functions as an exported function the allocator? I think it's better for controlapi not to implement this logic directly.
There was a problem hiding this comment.
Would it be possible to combine these functions as an exported function the allocator
Agree
Regarding the differences, I want to double check. Probably some are redundant given the spec validation that happen first.
|
Was this intended to prevent service connections to any "Internal" network at all, or merely the ingress network? From documentation, I assumed internal was a generic network type that application developers could use to place sensitive services in. |
|
@archisgore I am not sure I follow, I am adding some more background so that we are on the same page for further clarifications. This change is to allow user to remove and/or replace the swarm ingress network. It gives interested user control over the ingress network configuration. The ingress network is the infrastructure network which provides the routing mesh for traffic ingressing the host. In the swarmkit code this network was previously marked with a In docker networking model, there is also an This |
|
Oh okay. I see. I'm seeing some strange behavior with the latest builds where: Followed by a RemoteAPI call to start a service connected to that network returns: But if I do: The same call works. I searched for that error string and found this commit. |
|
Thank you @archisgore What you found is a bug. Will take care of it. |
Please see moby/moby/pull/31714 for details.
Signed-off-by: Alessandro Boch aboch@docker.com