progressui: adds a json output that shows raw events for the solver status#4113
Conversation
5e96a82 to
a105c98
Compare
| // Skip empty messages for the json printer. | ||
| return nil |
There was a problem hiding this comment.
I'm not sure if this is the correct decision since it might be making a choice for the user that we don't want to make.
|
There don't appear to be current tests for the raw output. I can try to add some tests for that if it's desired or if there's a location where these tests might be I'm happy to add new ones for the JSON format. |
a105c98 to
73c0472
Compare
tonistiigi
left a comment
There was a problem hiding this comment.
Enable this from the buildctl side as well. For testing I think it also makes sense to test it via buildctl test as it is mostly client-side behavior. Take a look at buildctl_test.go for some examples of such integration tests.
| } | ||
|
|
||
| // WithJSONPrinter prints each line of output as a JSON stream instead of as raw text. | ||
| func WithJSONPrinter() DisplaySolveStatusOpt { |
There was a problem hiding this comment.
This should just be a mode parameter in DisplaySolveStatus so the output doesn't only depend on passing the console.
There was a problem hiding this comment.
I think we might want to avoid the mode parameter. I had only seen mentions of mode in buildx and any mentions of auto, tty, or plain are gone by the time we reach this code. From what I could find, both the plain text and tty outputs both use DisplaySolveStatus and which gets used depends on whether console.Console is nil or not.
There was a problem hiding this comment.
But it is not very nice library if there are 3 distinct modes then to configure 2 of them you would use the console object and for the third there is a special method. And we will probably add even more modes in the future.
Buildctl also supports these progress modes so it is not buildx-only concept. We can just combine all the usage so it is defined in one place (basically push it in a little bit lower level).
| // jsonPrinter prints the text in JSON format. | ||
| type jsonPrinter struct{} | ||
|
|
||
| func (*jsonPrinter) Print(w io.Writer, index int, msg string) error { |
There was a problem hiding this comment.
JSON printing should keep the original formatting. Instead of converting to index-msg pairs and printing these as JSON it should print the original Vertex/VertexLog etc objects as soon as they appear.
Another logic would be to try to avoid partial events and only print vertex once it has completed. This would probably be easier to analyze from caller side.
Eg.
{
"digest": "",
"name": "",
"intervals": [
{start:, complete:, cached:, error:}
],
"logs": [
{timestamp:, msg}
]
}
There was a problem hiding this comment.
Looking at my previous example, it is hard to know when last interval has completed and that would basically mean that all steps would need to be completed before logs can appear that is not ideal in some cases.
Instead we could do something similar but only once per interval. So every time there is a completed event for vertex a new record is written out (with all the logs for that vertex). If step has multiple run intervals, there can be multiple objects with same digest.
We could also just do the raw event objects instead but maybe in that case we should call this rawjson.
There was a problem hiding this comment.
I mocked together an output format that I think will work well. As an example:
{"digest":"sha256:4fdbd27f357e8ca142a8dfde9064febf498ca55d6ac098fe79444f605d732794","name":"[xx 1/1] FROM docker.io/tonistiigi/xx:1.2.1@sha256:8879a398dedf0aadaacfbd332b29ff2f84bc39ae6d4e9c0a1109db27ac5ba012","statuses":[{"ID":"resolve docker.io/tonistiigi/xx:1.2.1@sha256:8879a398dedf0aadaacfbd332b29ff2f84bc39ae6d4e9c0a1109db27ac5ba012","Vertex":"sha256:4fdbd27f357e8ca142a8dfde9064febf498ca55d6ac098fe79444f605d732794","Name":"","Total":0,"Current":0,"Timestamp":"2023-08-10T18:00:22.336226751Z","Started":"2023-08-10T18:00:22.277476168Z","Completed":"2023-08-10T18:00:22.336226335Z"}],"done":true}
{"digest":"sha256:aac9b053bd01526a34fcda3b4c29b0e7563c8644ef47ecd67b8e908e08dec1f0","name":"[golatest 1/1] FROM docker.io/library/golang:1.20-alpine3.18@sha256:03278bc16e1a5b4fb6cdd3462108c060aa1e9c2353ce4d15d744b3c40168677d","statuses":[{"ID":"resolve docker.io/library/golang:1.20-alpine3.18@sha256:03278bc16e1a5b4fb6cdd3462108c060aa1e9c2353ce4d15d744b3c40168677d","Vertex":"sha256:aac9b053bd01526a34fcda3b4c29b0e7563c8644ef47ecd67b8e908e08dec1f0","Name":"","Total":0,"Current":0,"Timestamp":"2023-08-10T18:00:22.340789626Z","Started":"2023-08-10T18:00:22.291115668Z","Completed":"2023-08-10T18:00:22.340789168Z"}],"done":true}
{"digest":"sha256:8ad565762d44cf1653c371469b6e35b1fc41a74da2485b9e7c09673f7d5941b3","statuses":[{"ID":"transferring context:","Vertex":"sha256:8ad565762d44cf1653c371469b6e35b1fc41a74da2485b9e7c09673f7d5941b3","Name":"transferring","Total":0,"Current":755261,"Timestamp":"2023-08-10T18:00:22.705782626Z","Started":"2023-08-10T18:00:22.275040793Z","Completed":"2023-08-10T18:00:22.705782376Z"}],"done":true}
{"digest":"v4n5mbcrhrefgnjp2xpzelxcw","name":"[binaries-linux 4/4] COPY --link --from=buildkitd /usr/bin/buildkitd /","cached":true}
The basic gist of it is that each JSON blob has a digest attached to it and then the rest are only sections which correspond to updates on the vertex. The intervals are presently omitted but they can likely be attached to the done attribute or included as their own attribute if needed. It should be feasibly possible to use the JSON event stream to update a third-party UI with the information provided.
The name is included but only the first time that a vertex is printed. One of cached, done, or error will be printed when the vertex finishes. Statuses and logs are included as part of the blob.
I think this output will probably be best. The biggest concern for JSON outputs are repeated information (such as printing the digest over and over again) and redundant information (such as the full vertex struct). I also chose to keep partial log lines in as I think it's always possible for a third-party to choose to buffer partial log lines, but it's not possible to retrieve partial lines if the formatter isn't even giving them to you. Since the primary purpose of this output is to enable third-party progress bars, I thought it would be more simple to keep it and it can always be removed later without changing the external API.
This is only a mockup so I don't have code that I feel comfortable pushing for review yet.
|
so excited for this change, i've wanted something like this for years!!! |
|
Converting this to a draft. I'm going to push an update to the code which contains the interfaces and structure that I think match with the review comments, but I still need to reorganize the code. I wanted to get something pushed before EOD though to get feedback. |
73c0472 to
00a458d
Compare
| func DisplaySolveStatus(ctx context.Context, c console.Console, w io.Writer, ch chan *client.SolveStatus, opts ...DisplaySolveStatusOpt) ([]client.VertexWarning, error) { | ||
| var d SolveStatusDisplay | ||
| if c != nil { | ||
| d = NewConsoleDisplay(c) | ||
| } else { | ||
| d = newTextDisplay(w) | ||
| } | ||
| return UpdateDisplay(ctx, d, ch, opts...) | ||
| } |
There was a problem hiding this comment.
We can modify this if we want, but I think it will probably be better to keep this method around with its current signature until the places that use this have been updated. We can add a deprecation notice to it to ease that transition if it helps.
The new method is intended to be using NewDisplay to create the display and then UpdateDisplay to feed the display with the status updates.
| logsOffset int | ||
| logsBuffer *ring.Ring // stores last logs to print them on error | ||
| prev *client.Vertex | ||
| events []string |
There was a problem hiding this comment.
This appeared to be dead code. While this variable was used, it never appeared to be set except in this package where it was cleared after it was read from.
| Canceled bool `json:"canceled,omitempty"` | ||
| Error string `json:"error,omitempty"` | ||
| Cached bool `json:"cached,omitempty"` | ||
| Done bool `json:"done,omitempty"` |
There was a problem hiding this comment.
Instead of Done this should have the Start/Completed time.
There was a problem hiding this comment.
Modified this to just copy the Completed attribute from the client.SolveStatus instead of using Done.
| printVtx(p *textMux, t *trace, v *vertex) | ||
| } | ||
|
|
||
| type vertexUpdate struct { |
There was a problem hiding this comment.
As these are the types returned to user that supposedly someone will parse make these types public (can also put definition in another package for easier import).
I don't think this is vertexUpdate. It is more like a invocation.
There was a problem hiding this comment.
I've adapted the various VertexX types into their own structs that match this package's API a bit more cleanly. They're mostly based on the ones in the client package, but with some small differences and the correct JSON annotations.
Even when they're alike, it's probably better to keep them separate just so changing one of these types doesn't inadvertently affect the output of the other.
I think the progressui package shouldn't be a problem for most people to import, but we can move the type with a type alias if it ends up being a problem in the future or consumers can just copy the file with the JSON definitions pretty easily into their own projects.
35d9a5a to
ce9cb90
Compare
| func DisplaySolveStatus(ctx context.Context, c console.Console, w io.Writer, ch chan *client.SolveStatus, opts ...DisplaySolveStatusOpt) ([]client.VertexWarning, error) { | ||
| modeConsole := c != nil | ||
| // SolveStatusDisplay is a display that can show the updates to the solver status. | ||
| type SolveStatusDisplay interface { |
There was a problem hiding this comment.
Let's avoid public interfaces with private methods. I think all-public is fine here unless we completely want to hide it.
There was a problem hiding this comment.
The reason why I decided to make this have private methods is because the methods themselves aren't really useful or implementable for someone outside of this package. If we want to expose this to allow a custom SolveStatusDisplay, I think we need to do a larger refactor to expose more of this package. I'm a bit uncomfortable with doing that without a clear use case.
I think, if there's a compelling need to expose this more, we can defer what this interface should look like until then. Doing it this way allows us to implement the feature without making a promise to anyone using the library.
There was a problem hiding this comment.
I'm ok with NewDisplay just returning a struct and all this backend stuff being completely private.
| // | ||
| // For more control over the display method, it is suggested to use NewDisplay and UpdateDisplay instead | ||
| // of this function. | ||
| func DisplaySolveStatus(ctx context.Context, c console.Console, w io.Writer, ch chan *client.SolveStatus, opts ...DisplaySolveStatusOpt) ([]client.VertexWarning, error) { |
There was a problem hiding this comment.
I think this is unused now. If correct then we can remove it and use d = NewDisplay(...); d.Show(ch). (I think it might make sense to make d and the backend interface for display different objects then).
There was a problem hiding this comment.
I think we can mark this function as deprecated. I don't think we should remove it in the same commit as adding a new method. It would still be used by other repositories that aren't connected to this one. Those should be given an opportunity to be able to migrate to the new API.
There was a problem hiding this comment.
We don't have backward compatibility guarantees for Go vendors. This will just add maintenance cost(and stops us from making changes if we need to). Importers will still need to update the code anyway. Marking function deprecated usually means nobody will remember to remove it.
| // Information will only be included if it has changed since the last time | ||
| // that the information was sent or if it is the first time this vertex has | ||
| // been output. | ||
| type Vertex struct { |
There was a problem hiding this comment.
I believe everything but SolveStatus is already identical to the defined types. In that case, combine the types or use type aliases.
edit: I see the Vertex digest.Digest is different that could be added with a wrapper, embedding the shared object. For the VertexLog, the already defined type with Stream and without Partial is better.
There was a problem hiding this comment.
I think it's probably better to just keep them separate so we aren't tying the two APIs together. Yes, they'll share a lot of common attributes, but they're still different APIs. There is a bit of a difference here as you noted. This API kind of guarantees that the update struct will only contain data from one vertex rather than potentially having multiple different vertex updates in the same batch.
There was a problem hiding this comment.
Otoh I don't want these to start to diverge. I think that is a bigger risk. We can move types to separate packages if the imports would become too big. This can be done later but mark it as follow-up after this PR merges then.
There was a problem hiding this comment.
Do we want to just expose the raw solve status struct as-is? That might be better.
There was a problem hiding this comment.
You mean the raw event structs as they appear from protobuf? In that case, it should be jsonraw. It's not as useful for users because the only way to understand it would be to convert it into structured form first (something that can't be done with jq for example).
| ) | ||
|
|
||
| // SolveStatus holds the latest status update for a vertex by its digest. | ||
| type SolveStatus struct { |
There was a problem hiding this comment.
Maybe JSONProgress is clearer name.
There was a problem hiding this comment.
Maybe SolveStatusUpdate? This struct doesn't really communicate the JSON progress. It's just a struct that happens to have the metadata annotations to serialize to JSON with a certain format.
I do think it's unclear this is an update to the solve status and doesn't communicate the entirety of the process.
There was a problem hiding this comment.
Maybe VertexProgress as this object is scoped to a single vertex?
There was a problem hiding this comment.
I like this name. I'll make that change.
|
|
||
| type textMux struct { | ||
| w io.Writer | ||
| printer vertexPrinter |
There was a problem hiding this comment.
Ideally, the refactoring of current progress objects and new interfaces would be in a separate commit from the new JSON display. But if you are unsure yet if this is the final version of the refactors we can delay that split.
|
|
||
| func newTextMux(w io.Writer, printer vertexPrinter) *textMux { | ||
| if printer == nil { | ||
| printer = (*textPrinter)(nil) |
There was a problem hiding this comment.
I can imagine this construct being confusing for the readers. As this is all private, let's avoid it and ensure the caller always does the right thing.
There was a problem hiding this comment.
Yea sure. I'll change this.
|
|
||
| type jsonPrinter struct { | ||
| seen map[digest.Digest]struct{} | ||
| buf SolveStatus |
There was a problem hiding this comment.
I don't quite get the need for this buf property. If it just for passing data when calling methods from printVtx then it should be passed as function parameter.
There was a problem hiding this comment.
This is mostly to prevent large numbers of allocations for the slices in the SolveStatus struct. Since these slices are short-lived (used only for the marshaling), we don't need to be careful about the lifetime of the contents of the slice. So the printer keeps a slice around and truncates it to a size of zero but keeps its capacity the same to avoid extra allocations while printing.
It's just most simple to keep it here since this is the only section of code that uses this struct.
There was a problem hiding this comment.
Ah, ok. I'm not sure if this is premature optimization or not, but add a small comment explaining that this is for slice memory reuse.
Maybe it still makes sense to pass buf as a parameter to the other methods(turn them into helper functions).
| if js.isEmpty() { | ||
| return | ||
| } | ||
| out, _ := json.Marshal(&js.buf) |
There was a problem hiding this comment.
json.NewEncoder(w); enc.SetIndent("", " ");
There was a problem hiding this comment.
Can you expand on this? My intention was for the stream to come out unformatted and tools like jq or something else could perform the formatting if desired. Adding an intent will increase the amount of memory that this progress method utilizes.
I also may be wrong about this, but I believe there isn't a difference between json.NewEncoder(w).Encode(v) and json.Marshal(v) except the former has an extra memory allocation. The internal implementation of the encoder uses the same internal methods as marshal.
Looking at the source, there is a difference between the two when indenting is used, but MarshalIndent is likely more performant unless you store the json.Encoder struct.
See the source code in the standard library here: https://cs.opensource.google/go/go/+/refs/tags/go1.21.0:src/encoding/json/encode.go;l=179
In this one, it preallocates byte space for the indentation using a pretty simple metric. In contrast, here's the encoder variation: https://cs.opensource.google/go/go/+/refs/tags/go1.21.0:src/encoding/json/stream.go;l=224
This reuses a buffer that's kept in the encoder. But, this will only have allocated space if there was a previous encoding that happened. Since this struct is created, used, and immediately discarded, I believe it just increases memory usage.
There was a problem hiding this comment.
json.NewEncoder is optional. We already have a writer, so I think it is clearer to use the writer-based API rather than allocate output into a byte slice and write/format it into writer then. I don't think there is a memory allocation difference(as afaic atm Encode does not do partial writes for error handling), or at least that is not meaningful. The indent is because JSON format is also human readable, not only machine-readable, and we shouldn't make it harder for users to understand it. I don't think the marshaling speed is relevant here.
| } | ||
|
|
||
| func newJSONDisplay(w io.Writer) SolveStatusDisplay { | ||
| return newTextDisplay(w, &jsonPrinter{}) |
There was a problem hiding this comment.
I need to check this more but it looks to me that this newTextDisplay means that newTextMux is used for JSON as well. TextMux is the thing that tries print multiple parallel steps together in one text stream by making one of them active and then with control variables like antiFlicker/maxDelay etc controls how often the active item switches and when partial data is printed to avoid too much buffering. For JSON, I don't think this is ideal and we should print new JSON progress objects once the vertex goes to Complete state.
ce9cb90 to
d2f95b0
Compare
|
Went through and redid a bunch of this. The output is now named This version just marshals it directly. I refactored the interface a bit and hid it within a struct. I think this version should be more clean than the original. |
| Statuses []*VertexStatus | ||
| Logs []*VertexLog | ||
| Warnings []*VertexWarning | ||
| Vertexes []*Vertex `json:"vertexes,omitempty"` |
There was a problem hiding this comment.
Not sure if I should change this at the same time now that it will be publicly exposed, but the proper plural of vertex is vertices. That would involve a change to the field name though.
There was a problem hiding this comment.
There was a problem hiding this comment.
Ah, I didn't realize this was a regional thing.
tonistiigi
left a comment
There was a problem hiding this comment.
Mostly just naming comments
| cli.StringFlag{ | ||
| Name: "progress", | ||
| Usage: "Set type of progress (auto, plain, tty). Use plain to show container output", | ||
| Usage: "Set type of progress (auto, plain, tty, raw:json). Use plain to show container output", |
There was a problem hiding this comment.
Let's avoid : as it implies a microformat(eg. tty:json). If you don't like rawjson other possibilities would be something from "events".
There was a problem hiding this comment.
My intention is that raw would be the format and json is the microformat for raw. Maybe it would be better to just label it raw and then add raw:json in the future if we ever decided to do other formats for the raw type.
I feel like events probably won't be descriptive enough.
| if disp.phase == "" { | ||
| disp.phase = "Building" | ||
| } | ||
| type SolveStatusDisplay struct { |
There was a problem hiding this comment.
I think this can just be Display
|
|
||
| // NewPlainDisplay creates a new SolveStatusDisplay that outputs the status | ||
| // in a human-readable plain-text format. | ||
| func NewPlainDisplay(w io.Writer, opts ...DisplaySolveStatusOpt) SolveStatusDisplay { |
There was a problem hiding this comment.
leave these private until someone has a use-case. NewDisplay() should be enough.
There was a problem hiding this comment.
I'll remove it. We had a section of code that manually used the plain display and it created this awkward code:
d, _ = progressui.NewDisplay(os.Stderr, progressui.PlainMode)
That's fine it just felt weird discarding the error. But, the error is needed for NewDisplay because tty mode can fail and the progress mode can be invalid.
| // refresh updates the display with the latest state. | ||
| // This method only does something with displays that | ||
| // have buffered output. | ||
| refresh() |
There was a problem hiding this comment.
My intention behind this was to avoid coupling the intention with the implementation. I figured that the API for this was a bit similar to a double buffering API so update is used to update the internal state and then refresh is used to signal the display should be updated.
I can change it to flush but I felt the name refresh fit better for a display interface.
| type solveStatusDisplay interface { | ||
| // init initializes the display and opens any resources | ||
| // that are required. | ||
| init(displayTimeout time.Duration) |
There was a problem hiding this comment.
I think setting this to *rate.Limiter makes the intent clearer
|
|
||
| func (d *rawJSONDisplay) done() { | ||
| // Flush if the underling writer supports that method. | ||
| if flusher, ok := d.w.(interface { |
There was a problem hiding this comment.
I'm not sure if this should be handled here. If someone passes a buffered writer it is their responsibility to flush it after the display has completed.
There was a problem hiding this comment.
I'll remove this. My intention with this was to avoid any needed flushing for things like Stdout and Stderr and just to provide some convenience and parity to how the TTY output functions, but I don't think this is really necessary.
| } | ||
|
|
||
| func (d SolveStatusDisplay) Warnings() []client.VertexWarning { | ||
| return nil |
There was a problem hiding this comment.
Yea this is a leftover. I was originally going to consolidate the warnings into the struct, but it didn't work and I went back to returning them from UpdateFrom. This just didn't get removed.
tonistiigi
left a comment
There was a problem hiding this comment.
LGTM, squash the commits and make sure DCO is passing.
1365c4b to
53beced
Compare
|
Squashed and ready to merge. |
53beced to
06deef8
Compare
06deef8 to
799a8c6
Compare
…tatus This adds an additional display output for the progress indicator to support a json output. It refators the progressui package a bit to add a new method that takes in a `SolveStatusDisplay`. This `SolveStatusDisplay` can be created by the user using `NewDisplay` with the various modes as input parameters. The json output will print the raw events as JSON blobs. It will not throttle the messages or limit the display. It is meant as a pure raw marshaling of the underlying event stream. Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
799a8c6 to
3713178
Compare
|
@jsternberg sorry to comment on an old PR but would it be possible to update this to export both |
|
@ddl-retornam With |
@tonistiigi NewDisplay with the mode accepts |
|
Ah, yes, that is not correct. It should just accept |
@tonistiigi from a cursory glance it might be best to have it accept both |
|
I think modifying the interface to take in an |
|
@jsternberg There is already a runtime variable type detection for tty, even if we pass |
DisplaySolveStatus has been removed by moby/buildkit#4113.
DisplaySolveStatus has been removed by moby/buildkit#4113.
DisplaySolveStatus has been removed by moby/buildkit#4113.
* refactor: don't use WithFailFast It is no-op and has been removed from Buildkit. moby/buildkit#4484 * chore: upgrade moby/buildkit DisplaySolveStatus has been removed by moby/buildkit#4113.
This adds an additional display output for the progress indicator to
support a json output. It refactors the progressui package a bit to add
a new method that takes in a
SolveStatusDisplay. ThisSolveStatusDisplaycan be created by the user usingNewDisplaywiththe various modes as input parameters.
The json output will print the status updates and log messages for the
vertexes that have had updates. Each JSON blob has the same format and
it matches closely with the raw underlying status updates. The display
will still attempt to regulate the number of messages that come through
and will batch various events together if events are sent too quickly
according to the default parameters for rate limiting set in the
progressui package.
At the moment, there is no public API for creating your own
SolveStatusDisplayand each of the display implementations borrowheavily from each other. In the future, it is possible this API could be
expanded. If you need to create a custom progress bar that requires more
than what this package offers, I'd probably recommend just copying the
package and acting on the
SolveStatuschannel directly.Related to docker/buildx#178.