Skip to content

[statsd] Improve performance of telemetry serialization#641

Merged
sgnn7 merged 1 commit intomasterfrom
sgnn7/ac34-improve-statsd-telemetry-performance
Mar 4, 2021
Merged

[statsd] Improve performance of telemetry serialization#641
sgnn7 merged 1 commit intomasterfrom
sgnn7/ac34-improve-statsd-telemetry-performance

Conversation

@sgnn7
Copy link
Copy Markdown
Contributor

@sgnn7 sgnn7 commented Mar 4, 2021

What does this PR do?

By precomputing the telemetry formatting string, we are able to gain
2-15% increase in throughput over the old code in both Python2 and
Python3.

Sample timing data:

 $ ./time_flush.py
Using Python3
_flush_telemetry_orig: 6.24254 (baseline)
_flush_telemetry_new : 5.56866 (89.21%, 12.10% increase)
_flush_telemetry_orig: 5.78443 (baseline)
_flush_telemetry_new : 5.57556 (96.39%, 3.75% increase)

$ ./time_flush.py
Using Python2
_flush_telemetry_orig: 3.78670 (baseline)
_flush_telemetry_new : 3.47990 (91.90%, 8.82% increase)
_flush_telemetry_orig: 3.84288 (baseline)
_flush_telemetry_new : 3.51469 (91.46%, 9.34% increase)

Benchmark runner:

#!/usr/bin/env python3

import sys
import timeit

print("Using Python" + str(sys.version_info[0]))

x = timeit.Timer("""
from datadog import DogStatsd
DogStatsd()._flush_telemetry_orig()
""")
base_timing = x.timeit(100000)
base_timing += x.timeit(100000)
base_timing += x.timeit(100000)
print("_flush_telemetry_orig: %.5f (baseline)" % base_timing)

x = timeit.Timer("""
from datadog import DogStatsd
DogStatsd()._flush_telemetry()
""")
timing = x.timeit(100000)
timing += x.timeit(100000)
timing += x.timeit(100000)
print("_flush_telemetry_new : %.5f (%.2f%%, %.2f%% increase)" % (timing, timing / base_timing * 100, (base_timing / timing * 100) - 100))

x = timeit.Timer("""
from datadog import DogStatsd
DogStatsd()._flush_telemetry_orig()
""")
base_timing = x.timeit(100000)
base_timing += x.timeit(100000)
base_timing += x.timeit(100000)
print("_flush_telemetry_orig: %.5f (baseline)" % base_timing)

x = timeit.Timer("""
from datadog import DogStatsd
DogStatsd()._flush_telemetry()
""")
timing = x.timeit(100000)
timing += x.timeit(100000)
timing += x.timeit(100000)
print("_flush_telemetry_new : %.5f (%.2f%%, %.2f%% increase)" % (timing, timing / base_timing * 100, (base_timing / timing * 100) - 100))

Description of the Change

Small optimization in string composition in a hot path that uses less CPU cycles

Alternate Designs

Experimented with various other ways to compose a string (.format, .format with keyed data, StringIO, etc) - this one was the most performant.

Possible Drawbacks

N/A

Verification Process

Tests already cover this functionality via unit tests

Additional Notes

Release Notes

Review checklist (to be filled by reviewers)

  • Feature or bug fix MUST have appropriate tests (unit, integration, etc...)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have one changelog/ label attached. If applicable it should have the backward-incompatible label attached.
  • PR should not have do-not-merge/ label attached.
  • If Applicable, issue must have kind/ and severity/ labels attached at least.

@sgnn7 sgnn7 added the changelog/Changed Changed features results into a major version bump label Mar 4, 2021
@sgnn7 sgnn7 requested review from a team as code owners March 4, 2021 20:58
By precomputing the telemetry formatting string, we are able to gain
2-15% increase in throughput over the old code in both Python2 and
Python3.

Sample timing data:
```
 $ ./time_flush.py
Using Python3
_flush_telemetry_orig: 6.24254 (baseline)
_flush_telemetry_new : 5.56866 (89.21%, 12.10% increase)
_flush_telemetry_orig: 5.78443 (baseline)
_flush_telemetry_new : 5.57556 (96.39%, 3.75% increase)

$ ./time_flush.py
Using Python2
_flush_telemetry_orig: 3.78670 (baseline)
_flush_telemetry_new : 3.47990 (91.90%, 8.82% increase)
_flush_telemetry_orig: 3.84288 (baseline)
_flush_telemetry_new : 3.51469 (91.46%, 9.34% increase)
```
@sgnn7 sgnn7 force-pushed the sgnn7/ac34-improve-statsd-telemetry-performance branch from 714ca87 to 55a8bd9 Compare March 4, 2021 21:01
Copy link
Copy Markdown

@ogaca-dd ogaca-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sgnn7 sgnn7 merged commit 7d0d4a2 into master Mar 4, 2021
@sgnn7 sgnn7 deleted the sgnn7/ac34-improve-statsd-telemetry-performance branch March 4, 2021 21:14
@@ -559,16 +572,15 @@ def _reset_telemetry(self):

def _flush_telemetry(self):
telemetry_tags = ",".join(self._add_constant_tags(self._client_tags))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could potentially cache that as well, and then cache the resulting string ? So we end up only passing the numbers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems very reasonable and from a cursory look, I think we can precompute the stringified tags just like you said. Overall though it has been a bit quite a hit-or-miss for what seems like should improve the performance vs what actually gives better benchmarking results so I'll need to instrument it. For now I'm making tiny changes here-and-there in obvious places and as I get a bit more comfortable I'll see about these tags!

@zippolyte zippolyte added changelog/Fixed Fixed features results into a bug fix version bump and removed changelog/Changed Changed features results into a major version bump labels Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/Fixed Fixed features results into a bug fix version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants