[statsd] Fix buffer operation thread-safety#642
Conversation
datadog/dogstatsd/base.py
Outdated
| if not hasattr(self, 'buffer'): | ||
| raise BufferError('Cannot close buffer that was never opened') |
There was a problem hiding this comment.
That might be correct though it looks like a different change.
There was a problem hiding this comment.
The edge case was found during the test writing. It's a separate commit but it can be split out into a separate PR if you want.
datadog/dogstatsd/base.py
Outdated
| finally: | ||
| # Release all locks that might have been held by our thread. | ||
| while self._buffer_lock_depth > 0: | ||
| self._buffer_lock_depth -= 1 | ||
| self._buffer_lock.release() |
There was a problem hiding this comment.
Does this mean you can have multiple open_buffer from the same thread and only one close_buffer?
That sounds like trying to circumvent programming errors.
There was a problem hiding this comment.
Does this mean you can have multiple open_buffer from the same thread and only one close_buffer?
That sounds like trying to circumvent programming errors.
Hey @jd, there was no real guidance to our users that an open_buffer should match a close_buffer. Even if we did have that guidance, it's not technically a programming error - the following code gets into the tricky territory of "what's the correct behavior expected" that ends up being ambiguous as to what it should do/allow:
statds.open_buffer()
# do something to the buffer
statsd.open_buffer()
# do something else
statsd.close_buffer()The second open_buffer in the strictest of terms (which is what you were pointing out I think) seems like it should just throw an exception as it's out-of-order but given the former behavior and its user friendliness it could be interpreted as a "just give me a new buffer to write". We've discussed this a bit on slack before the PR was implemented and chose the latter as a more-appropriate behavior.
As a sidenote, both open_buffer and close_buffer are getting deprecated soon since we want buffering by default so they will effectively be noops in the future.
| """ | ||
| Flush the buffer and switch back to single metric packets. | ||
| """ | ||
| if not hasattr(self, 'buffer'): |
There was a problem hiding this comment.
Did you consider setting self.buffer at None in the constructor as an alternative solution?
There was a problem hiding this comment.
Great point and I did consider that as an alternative but that option has the unintended effect of ignoring blatantly erroneous behavior when the user does not call open_buffer even a single time before trying to close the buffer. While we might not care from the logic point of view if the buffer was created or not beforehand, it made more sense here to notify the user that their code is likely very wrong.
datadog/dogstatsd/base.py
Outdated
| self._flush_buffer() | ||
| finally: | ||
| # Release all locks that might have been held by our thread. | ||
| while self._buffer_lock_depth > 0: |
There was a problem hiding this comment.
Consider the following code:
d = DogStatsd()
# ...
with d as batch:
with d as other_batch:
other_batch.gauge("users.online", 123)
# As the lock was released at the previous line, another thread can call `d.open_buffer()` and update buffer at the same time as `batch.gauge("users.online", 123)`
batch.gauge("users.online", 123)As a sidenote, both open_buffer and close_buffer are getting deprecated soon since we want buffering by default so they will effectively be noops in the future
Note: I am not sure it worth handling the case in the previous code as these functions are getting deprecated.
There was a problem hiding this comment.
That's an interesting edge case for sure! Let me think about this some more when I get time and I'll get back to you on it.
There was a problem hiding this comment.
After some thinking about this particular example and other edge use cases, I think it might be just better to make the lock not explicitly re-entrant from the same thread which should fix your and @jd's concerns.
When a `close_buffer` was called before any `open_buffer`, we would raise an unexpected exception. This change ensures that we have a consistent error shown to the user when that happens.
Old code did not have locking around its `open_buffer()` and `close_buffer()` operations which made those two methods not thread-safe. This change fixes this and aligns this module with the documentation.
f439b1c to
4b2aa1d
Compare
What does this PR do?
Old code did not have locking around its
open_buffer()andclose_buffer()operations which made those two methods notthread-safe. This change fixes this and aligns this module with the
documentation.
Fixes #439
Fixes #601
Description of the Change
We now have an
RLockaround buffer operations that prevents threadsfrom changing the buffer owned by another thread.
Overall performance decrease when sending 2+ metrics is within the
+/- 5%margin of error:Alternate Designs
Possible Drawbacks
Verification Process
Unit tests cover the full scope of changes
Additional Notes
Release Notes
Review checklist (to be filled by reviewers)
changelog/label attached. If applicable it should have thebackward-incompatiblelabel attached.do-not-merge/label attached.kind/andseverity/labels attached at least.