[statsd] Fix buffer operation thread-safety by sgnn7 · Pull Request #642 · DataDog/datadogpy

sgnn7 · 2021-03-05T23:41:07Z

What does this PR do?

Old code did not have locking around its open_buffer() and
close_buffer() operations which made those two methods not
thread-safe. This change fixes this and aligns this module with the
documentation.

Fixes #439
Fixes #601

Description of the Change

We now have an RLock around buffer operations that prevents threads
from changing the buffer owned by another thread.

Overall performance decrease when sending 2+ metrics is within the +/- 5% margin of error:

$ /time_buffer_ops.py 
Using Python3
baseline: 6.19679 (baseline)
mod     : 6.02907 (97.29%, 2.78% perf increase)
baseline: 6.20096 (baseline)
mod     : 6.04097 (97.42%, 2.65% perf increase)
baseline: 6.05407 (baseline)
mod     : 6.12549 (101.18%, -1.17% perf increase)
baseline: 6.16327 (baseline)
mod     : 6.16413 (100.01%, -0.01% perf increase)
baseline: 5.94379 (baseline)
mod     : 6.18436 (104.05%, -3.89% perf increase)

$ ./time_buffer_ops.py 
Using Python2
baseline: 5.24699 (baseline)
mod     : 5.25159 (100.09%, -0.09% perf increase)
baseline: 5.17050 (baseline)
mod     : 5.27538 (102.03%, -1.99% perf increase)
baseline: 5.46608 (baseline)
mod     : 5.56813 (101.87%, -1.83% perf increase)
baseline: 5.10800 (baseline)
mod     : 5.41473 (106.01%, -5.66% perf increase)
baseline: 5.06513 (baseline)
mod     : 5.27768 (104.20%, -4.03% perf increase)

Alternate Designs

Possible Drawbacks

Verification Process

Unit tests cover the full scope of changes

Additional Notes

Release Notes

Review checklist (to be filled by reviewers)

Feature or bug fix MUST have appropriate tests (unit, integration, etc...)
PR title must be written as a CHANGELOG entry (see why)
Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
PR must have one changelog/ label attached. If applicable it should have the backward-incompatible label attached.
PR should not have do-not-merge/ label attached.
If Applicable, issue must have kind/ and severity/ labels attached at least.

jd · 2021-03-09T14:34:11Z

datadog/dogstatsd/base.py

+            if not hasattr(self, 'buffer'):
+                raise BufferError('Cannot close buffer that was never opened')


That might be correct though it looks like a different change.

The edge case was found during the test writing. It's a separate commit but it can be split out into a separate PR if you want.

jd · 2021-03-09T14:35:44Z

datadog/dogstatsd/base.py

+        finally:
+            # Release all locks that might have been held by our thread.
+            while self._buffer_lock_depth > 0:
+                self._buffer_lock_depth -= 1
+                self._buffer_lock.release()


Does this mean you can have multiple open_buffer from the same thread and only one close_buffer?
That sounds like trying to circumvent programming errors.

Does this mean you can have multiple open_buffer from the same thread and only one close_buffer?
That sounds like trying to circumvent programming errors.

Hey @jd, there was no real guidance to our users that an open_buffer should match a close_buffer. Even if we did have that guidance, it's not technically a programming error - the following code gets into the tricky territory of "what's the correct behavior expected" that ends up being ambiguous as to what it should do/allow:

statds.open_buffer() # do something to the buffer statsd.open_buffer() # do something else statsd.close_buffer()

The second open_buffer in the strictest of terms (which is what you were pointing out I think) seems like it should just throw an exception as it's out-of-order but given the former behavior and its user friendliness it could be interpreted as a "just give me a new buffer to write". We've discussed this a bit on slack before the PR was implemented and chose the latter as a more-appropriate behavior.

As a sidenote, both open_buffer and close_buffer are getting deprecated soon since we want buffering by default so they will effectively be noops in the future.

ogaca-dd · 2021-04-02T08:38:43Z

datadog/dogstatsd/base.py

        """
        Flush the buffer and switch back to single metric packets.
        """
+        if not hasattr(self, 'buffer'):


Did you consider setting self.buffer at None in the constructor as an alternative solution?

Great point and I did consider that as an alternative but that option has the unintended effect of ignoring blatantly erroneous behavior when the user does not call open_buffer even a single time before trying to close the buffer. While we might not care from the logic point of view if the buffer was created or not beforehand, it made more sense here to notify the user that their code is likely very wrong.

ogaca-dd · 2021-04-02T09:42:07Z

datadog/dogstatsd/base.py

+                self._flush_buffer()
+        finally:
+            # Release all locks that might have been held by our thread.
+            while self._buffer_lock_depth > 0:


Consider the following code:

d = DogStatsd() # ... with d as batch: with d as other_batch: other_batch.gauge("users.online", 123) # As the lock was released at the previous line, another thread can call `d.open_buffer()` and update buffer at the same time as `batch.gauge("users.online", 123)` batch.gauge("users.online", 123)

As a sidenote, both open_buffer and close_buffer are getting deprecated soon since we want buffering by default so they will effectively be noops in the future

Note: I am not sure it worth handling the case in the previous code as these functions are getting deprecated.

That's an interesting edge case for sure! Let me think about this some more when I get time and I'll get back to you on it.

After some thinking about this particular example and other edge use cases, I think it might be just better to make the lock not explicitly re-entrant from the same thread which should fix your and @jd's concerns.

When a `close_buffer` was called before any `open_buffer`, we would raise an unexpected exception. This change ensures that we have a consistent error shown to the user when that happens.

Old code did not have locking around its `open_buffer()` and `close_buffer()` operations which made those two methods not thread-safe. This change fixes this and aligns this module with the documentation.

ogaca-dd

LGTM

sgnn7 added the changelog/Fixed Fixed features results into a bug fix version bump label Mar 5, 2021

sgnn7 requested review from a team as code owners March 5, 2021 23:41

jd reviewed Mar 9, 2021

View reviewed changes

ogaca-dd reviewed Apr 2, 2021

View reviewed changes

ogaca-dd previously approved these changes Apr 2, 2021

View reviewed changes

sgnn7 added 2 commits April 8, 2021 14:18

[statsd] Improve exception when calling buffer ops out-of-order

0e45cc0

When a `close_buffer` was called before any `open_buffer`, we would raise an unexpected exception. This change ensures that we have a consistent error shown to the user when that happens.

[statsd] Fix buffer operation thread-safety

4b2aa1d

Old code did not have locking around its `open_buffer()` and `close_buffer()` operations which made those two methods not thread-safe. This change fixes this and aligns this module with the documentation.

sgnn7 dismissed ogaca-dd’s stale review via 4b2aa1d April 8, 2021 20:23

sgnn7 force-pushed the sgnn7/ac34-statsd-batch-thread-safety branch from f439b1c to 4b2aa1d Compare April 8, 2021 20:23

ogaca-dd approved these changes Apr 9, 2021

View reviewed changes

sgnn7 merged commit 33f727c into master Apr 9, 2021

sgnn7 deleted the sgnn7/ac34-statsd-batch-thread-safety branch April 9, 2021 16:20

sgnn7 mentioned this pull request Apr 9, 2021

[dogstatsd] Fix race condition in _flush_buffer #602

Closed

6 tasks

raphaelauv mentioned this pull request May 28, 2021

Missing : sync flush to send metric without sleep #654

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[statsd] Fix buffer operation thread-safety#642

[statsd] Fix buffer operation thread-safety#642
sgnn7 merged 2 commits intomasterfrom
sgnn7/ac34-statsd-batch-thread-safety

sgnn7 commented Mar 5, 2021

Uh oh!

jd Mar 9, 2021

Uh oh!

sgnn7 Mar 9, 2021

Uh oh!

jd Mar 9, 2021

Uh oh!

sgnn7 Mar 9, 2021 •

edited

Loading

Uh oh!

ogaca-dd Apr 2, 2021

Uh oh!

sgnn7 Apr 2, 2021

Uh oh!

ogaca-dd Apr 2, 2021

Uh oh!

sgnn7 Apr 2, 2021

Uh oh!

sgnn7 Apr 8, 2021 •

edited

Loading

Uh oh!

ogaca-dd left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if not hasattr(self, 'buffer'):
		raise BufferError('Cannot close buffer that was never opened')

Conversation

sgnn7 commented Mar 5, 2021

What does this PR do?

Description of the Change

Alternate Designs

Possible Drawbacks

Verification Process

Additional Notes

Release Notes

Review checklist (to be filled by reviewers)

Uh oh!

jd Mar 9, 2021

Choose a reason for hiding this comment

Uh oh!

sgnn7 Mar 9, 2021

Choose a reason for hiding this comment

Uh oh!

jd Mar 9, 2021

Choose a reason for hiding this comment

Uh oh!

sgnn7 Mar 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogaca-dd Apr 2, 2021

Choose a reason for hiding this comment

Uh oh!

sgnn7 Apr 2, 2021

Choose a reason for hiding this comment

Uh oh!

ogaca-dd Apr 2, 2021

Choose a reason for hiding this comment

Uh oh!

sgnn7 Apr 2, 2021

Choose a reason for hiding this comment

Uh oh!

sgnn7 Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogaca-dd left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sgnn7 Mar 9, 2021 •

edited

Loading

sgnn7 Apr 8, 2021 •

edited

Loading