compute: tokenize error logging#18772
Conversation
702204a to
eccdfa0
Compare
55f4749 to
30b98ee
Compare
vmarcos
left a comment
There was a problem hiding this comment.
The first four commits look good to me; I only have a minor remark about some warning messages that appear to have been shortened in some places.
I am uncertain about the last commit. The previous code organization was already in place prior to my start in Materialize and I do not think I have a clear understanding of why these functions were not part of Context to start with or otherwise why they were not collected under some other abstraction. Perhaps we can ask for someone with historical background to manifest before moving on with this change? Also, I did not give enough thinking to the question of whether there may also be a better code organization that we could think of rather than having all of the functions collected under Context.
| continue; | ||
| } | ||
| let message = "Non-positive multiplicity in DistinctBy Retractions"; | ||
| warn!(?row, ?count, debug_name, "[customer-data] {message}"); |
There was a problem hiding this comment.
One downside to this approach is that we lose structured field logging. How ugly would it be to supply some macros that take an ErrorLogger?
There was a problem hiding this comment.
I've not been able to come up with anything ergonomic so far. We could probably introduce something like:
log_dataflow_error!(
self.error_logger(),
"Non-positive multiplicity in DistinctBy Retractions",
?row, ?count, debug_name,
"[customer-data] Non-positive multiplicity in DistinctBy Retractions",
);Doesn't seem too bad, actually. Especially because we could make it possible to reference the message in the details format string, and make the debug_name be implicitly added to all logs.
log_dataflow_error!(
self.error_logger(),
"Non-positive multiplicity in DistinctBy Retractions",
?row, ?count,
"[customer-data] {message}",
);I'm just not sure whether my macro foo is up to this task :)
In any case, I'd rather make this the scope of another PR, since I foresee some bikeshedding.
There was a problem hiding this comment.
Nice! I'm also happy to do this if you don't want to deal with it.
| self.error_logger(), | ||
| ) | ||
| .leave_region() | ||
| self.render_reduce_plan(reduce_plan, ok, err, key_arity) |
There was a problem hiding this comment.
I'm a fan of this change (throughout).
tokenrove
left a comment
There was a problem hiding this comment.
👍 modulo some of the details we discussed, and as mentioned, we can do a followup to this for the most bikesheddable bits.
4158a99 to
4911650
Compare
This commit introduces a new `ErrorLogger` type that will be used for logging errors inside dataflows. This type wraps a dataflow shutdown token and encapsules both shutdown-checked logging and our `warn!`/`error!` approach to logging dataflow errors.
This commit extends the rendering `Context` by a `shutdown_token` field that can be used by rendered operators to check whether the dataflow is in the progess of shutting down. The shutdown token is replaced for each entry in `objects_to_build`, to support finer-grained dataflow shutdown in a future where dataflows can have more than one export. This commit also adjusts variable naming to make it clear that the various `Context::export_*` methods receive a list of all dependency IDs, which include `objects_to_build` IDs, rather than just IDs of dataflow imports. As a consequence, there is no need to adjust the export methods to ensure that they also pick up the new object tokens.
This commit modifies the error logging behavior of compute operators to only log messages when the dataflow is not in the process of shutting down. This avoids producing noise caused by dataflow operators emitting partial update sets when they are cancelled. The shutdown checking is performed through a shutdown token and wrapped inside the new `ErrorLogger` type that gets passed around to the various operator renderers.
This commit converts all freestanding reduce and top-k rendering functions that want to do error logging into methods of the `Context` type. The benefit is that we can rid of two parameters, `debug_name` and `error_logger`, in exchange for a new `&self` parameter, and changed indentation. Note that the top-k functions where previously definied inside the scope of `Context::render_topk` (for no apparent reason). Making them methods of `Context` actually reduces indentation.
4911650 to
bcc7f8d
Compare
Everywhere we log something in compute rendering, we include the debug name of the dataflow in the logged details. It makes sense to build the dataflow name logging directly into the `ErrorLogger`, so we don't have to repeat ourselves at the callsites.
I asked in Slack and there doesn't seem to be a specific reason. This should be ready for another look! I've addressed the comments and added another commit that pulls the |
vmarcos
left a comment
There was a problem hiding this comment.
Thanks for asking in Slack; the changes LGTM! I think it is quite tricky to test this at the moment, but maybe we can think about an approach when we allow persist_source to stop emitting early as well.
|
TFTRs despite the huge diff! |
This PR modifies the error logging behavior of compute operators to only log messages to Sentry when the dataflow is not in the process of shutting down. This avoids producing noise caused by dataflow operators emitting partial update sets when they are cancelled. It is a precondition for enabling the faster dataflow shutdown methods described in the "Faster Dataflow Shutdown" design.
The shutdown checking is performed through a shutdown token that is wrapped inside a new
ErrorLoggertype that gets passed around to the various operator renderers. AnErrorLoggerencapsulates both shutdown checking and ourwarn!/error!logging format.Motivation
Enables #18760.
Tips for reviewer
Look at the commits and their descriptions separately!
MaybeValidatingRowtrait into a newmz_compute::render::errorsmodule, which is also where the newErrorLoggertype lives.ErrorLoggertype.Contextand updates the top-level rendering code to make sure the token gets added to the exports depending on it.ErrorLogger, rather than using thewarn!/error!macros directly.Contexttype. Rather than requiringdebug_nameanderror_loggerarguments, they can now get these fields directly from the context, reducing the number of function arguments that need to be passed.Unfortunately, the diff for the last commit is quite hairy, since it changes not only the indentation level, but also causes rustfmt to re-wrap many of the higher-indented lines. Hiding whitespace changes helps a bit, but still produces a diff much larger than it should be. Since the last commit is a pure code organization change and doesn't affect functionality, I would be fine with dropping it if you'd rather not review it.
One thought about theThis is now implemented in the sixth commit!ErrorLoggertype: It would be quite easy to also wrap thedebug_name, in addition to the token, inside it. Then we wouldn't have to pass that around in rendering anymore. The drawback is that we'd lose our current flexibility in formatting the "details" part. We'd probably need to put the dataflow name as a fixed prefix of the message, so constructs like "found invalid accumulations in dataflow {debug_name}" would no longer be possible.If anyone has an idea how to actually test this change, I'd be happy to hear it! I couldn't come up with a way to provoke invalid retractions during dataflow shutdown (they should be impossible today), much less doing so in a reliable way.
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-protolabel.