Diagnostic client#2032
Conversation
ddebroy
left a comment
There was a problem hiding this comment.
Looks good. Had a couple of minor comments. One q: can the remediation be triggered automatically once in a while as part of a "garbage collection sweep" or is that too dangerous?
| if err != nil { | ||
| logrus.WithError(err).Fatalf("The connection failed") | ||
| } | ||
| httpIsOk(resp.Body) |
There was a problem hiding this comment.
You can directly check resp.StatusCode for 200/2xx codes which indicate HTTP OK rather than parsing resp.body for OK: https://golangcode.com/get-the-http-response-status-code/
There was a problem hiding this comment.
I think can be a good enhancement, right now the server side did not really set any specific error code, but it would be good to implement a more proper api with error codes
| } | ||
| httpIsOk(resp.Body) | ||
|
|
||
| clsuterPeers := fetchNodePeers(*ipPtr, *portPtr, "") |
| @echo "🐳 $@" | ||
| go build -o "bin/dnet-$$GOOS-$$GOARCH" ./cmd/dnet | ||
| go build -o "bin/docker-proxy-$$GOOS-$$GOARCH" ./cmd/proxy | ||
| go build -o "bin/dignosticClient-$$GOOS-$$GOARCH" ./diagnose/client |
There was a problem hiding this comment.
nit: why put it in /diagnose/ and not /cmd/ ? like the others ?
| for _, k := range orphanKeys { | ||
| resp, err := http.Get(fmt.Sprintf(deleteEntry, ip, port, network, tableName, k)) | ||
| if err != nil { | ||
| logrus.WithError(err).Fatalf("Failed deleting entry k:%s", k) |
There was a problem hiding this comment.
do we really want to fatal here ? since it will stop the program, it won't delete the remaining entries.
84493e3 to
989af94
Compare
Codecov Report
@@ Coverage Diff @@
## master #2032 +/- ##
=========================================
Coverage ? 40.46%
=========================================
Files ? 138
Lines ? 22172
Branches ? 0
=========================================
Hits ? 8971
Misses ? 11884
Partials ? 1317
Continue to review full report at Codecov.
|
|
@fcrisciani can you include a README / Markdown file in this PR? Misty started working on one in docker/docs#5558, and can be used as a starting point |
3ba06ad to
ef512ba
Compare
|
@thaJeztah @ddebroy @vieux can you guys give another pass? |
ddebroy
left a comment
There was a problem hiding this comment.
LGTM. Just a couple of minor string nits
| A message like the following will appear in the Docker host logs: | ||
|
|
||
| ```none | ||
| Starting the diagnose server listening on <port> for commands |
There was a problem hiding this comment.
should the string diagnose be diagnostics ?
| A message like the following will appear in the Docker host logs: | ||
|
|
||
| ```none | ||
| Disabling the diagnose server |
There was a problem hiding this comment.
should the string diagnose be diagnostics ?
ef512ba to
ed0721b
Compare
|
@thaJeztah can you also give the final blessing? |
766b46b to
5b9614f
Compare
ddebroy
left a comment
There was a problem hiding this comment.
Looks good with a couple of minor comments.
| // IsDebugEnable returns true when the debug is enabled | ||
| func (s *Server) IsDebugEnable() bool { | ||
| // IsDiagnosticEnable returns true when the debug is enabled | ||
| func (s *Server) IsDiagnosticEnable() bool { |
There was a problem hiding this comment.
Minor nit: how about changing the name to IsDiagnosticEnable**d** like in controller
func (c *controller) IsDiagnosticEnabled() bool {
| return nil, err | ||
| } | ||
| if entry != nil && entry.deleting { | ||
| return nil, types.NotFoundErrorf("entry not found in table %s with network id %s and key %s", tname, nid, key) |
There was a problem hiding this comment.
Masking the existence into a "not found" message may be confusing if this message makes into logs but say the actual deletion does not trigger for a while. How about making the message something along the lines of: "entry found but in deleting state. returning not found" to keep things super clear for support engineers?
There was a problem hiding this comment.
changing in: return nil, types.NotFoundErrorf("entry in table %s network id %s and key %s deleted and pending garbage collection", tname, nid, key)
| @@ -0,0 +1,8 @@ | |||
| FROM docker:17.12-rc-dind | |||
There was a problem hiding this comment.
non-rc tag 17.12-dind is now available (https://hub.docker.com/_/docker/)
There was a problem hiding this comment.
I was also thinking why we needed the dind for this tool, but this is so that we can use it for older daemons, which do not have this functionality built-in, correct?
Should we have two versions of the image? one "minimal" (just the binary), and one dind?
|
|
||
| RUN apk add --no-cache curl | ||
|
|
||
| WORKDIR /tool |
There was a problem hiding this comment.
Instead of installing in /tool, you could just install in /usr/local/bin
| **Standalone network:** | ||
|
|
||
| ```bash | ||
| $ debugClient -c sd -v -net n8a8ie6tb3wr2e260vxj8ncy4 |
There was a problem hiding this comment.
s/debugClient/diagnosticClient/
| **Overlay network:** | ||
|
|
||
| ```bash | ||
| $ debugClient -port 2001 -c overlay -v -net n8a8ie6tb3wr2e260vxj8ncy4 |
There was a problem hiding this comment.
s/debugClient/diagnosticClient/
| WORKDIR /tool | ||
|
|
||
| COPY daemon.json /etc/docker/daemon.json | ||
| COPY diagnosticClient /tool/diagnosticClient |
There was a problem hiding this comment.
For a follow-up; we should make this Dockerfile a multi-stage build, and actually build the client in it (I had a branch with that, but looks like I didn't push, and it's not on my laptop 😊
| Remember that table operations have ownership, so any `create entry` will be persistent till | ||
| the diagnostic container is part of the swarm. | ||
|
|
||
| 1. Make sure that the node where the diagnostic client will run is not part of the swarm, if so do `docker swarm leave -f` |
There was a problem hiding this comment.
We'll need different steps for 17.12+ daemons (as they don't have to leave the swarm, just the diagnostic client to connect to them)
There was a problem hiding this comment.
This is the container version so you will run the dind version and you will need to leave the swarm
There was a problem hiding this comment.
Understood; thinking if we should have instructions (also using a containerised version) to use for 17.12 daemons (just bind-mounting /var/run/docker.sock e.g. and connecting to the running daemon)
There was a problem hiding this comment.
added the containerized version of it, as dockereng/network-diagnostic:onlyclient that can be used as it with --net host, no need for the docker.sock
| 2. To run the container, use a command like the following: | ||
|
|
||
| ```bash | ||
| $ docker container run --name net-diagnostic -d --privileged --network host fcrisciani/network-diagnostic |
There was a problem hiding this comment.
Can we put the image under an official organization? (dockereng/ or dockercore/ e.g.)
198046b to
2ccffde
Compare
|
Note to myself, after this gets merged the moby vendoring requires code change due to changes in method names |
Align it to the moby/moby external api Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- the client allows to talk to the diagnostic server and decode the internal values of the overlay and service discovery - the tool also allows to remediate in case of orphans entries - added README Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2ccffde to
be91c3e
Compare
thaJeztah
left a comment
There was a problem hiding this comment.
LGTM, but left two suggestions 👍
| `HUP` signal to the PID you found in the previous step. | ||
|
|
||
| ```bash | ||
| kill -HUP <pid-of-dockerd> |
There was a problem hiding this comment.
nit: this could be killall -HUP dockerd
| `HUP` signal to the PID you found in the previous step. | ||
|
|
||
| ```bash | ||
| kill -HUP <pid-of-dockerd> |
Uh oh!
There was an error while loading. Please reload this page.