Be able to correlate timeouts in reverse-proxy layer in front of Synapse (tag specific request headers) by MadLittleMods · Pull Request #13786 · matrix-org/synapse

MadLittleMods · 2022-09-12T22:55:59Z

Add opentracing.request_headers_to_tag config to specify which request headers extract and tag traces with in order to correlate timeouts in reverse-proxy layers in front of Synapse with traces.

Fix #13685

Alternative solution: #13801

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file. The entry should:
- Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
- Use markdown where necessary, mostly for code blocks.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
~~Pull request includes a sign off~~
Code style is correct
(run the linters)

…pse (tag specific request headers) Fix #13685

MadLittleMods · 2022-09-12T23:18:27Z

synapse/api/auth.py

+                # Tag any headers that we need to extract from the request. This
+                # is useful to specify any headers that a reverse-proxy in front
+                # of Synapse may be sending to correlate and match up something
+                # in that layer to a Synapse trace. ex. when Cloudflare times
+                # out it gives a `cf-ray` header which we can also tag here to
+                # find the trace.
+                for header_name in self.hs.config.tracing.request_headers_to_tag:
+                    headers = request.requestHeaders.getRawHeaders(header_name)
+                    if headers is not None:
+                        parent_span.set_tag(
+                            SynapseTags.REQUEST_HEADER_PREFIX + header_name,
+                            str(headers[0]),
+                        )


I've tested that this works locally but I'm not sure if the cf-ray header that we want to use on matrix.org is actually a thing.

Based on the following documentation and that we see it in our haproxy logs already, I assume this will be viable,

Add the CF-Ray header to your origin web server logs to match requests proxied to Cloudflare to requests in your server logs.

synapse/api/auth.py

clokep · 2022-09-13T11:55:22Z

synapse/api/auth.py

+                    if headers is not None:
+                        parent_span.set_tag(
+                            SynapseTags.REQUEST_HEADER_PREFIX + header_name,
+                            str(headers[0]),


Why the call to str? If header_name is a string, then headers[0] will be a string.

See:

https://github.com/twisted/twisted/blob/7c00738a0c24032070ce92304b4c8b887666c0fc/src/twisted/web/http_headers.py#L256-L258

clokep · 2022-09-13T11:56:38Z

synapse/api/auth.py

+                    headers = request.requestHeaders.getRawHeaders(header_name)
+                    if headers is not None:
+                        parent_span.set_tag(
+                            SynapseTags.REQUEST_HEADER_PREFIX + header_name,


I'm not really familiar with opentracing tags, is request_header.cf-ray an OK thing to do? Or do we need to normalize that to something like request_header.cf_ray or whatever?

The tag can be whatever. And since it's the name of a header, I think it's probably easiest to follow with the actual header name. Similar to what we do with function arguments but those happen to conform to the Python snake casing already.

These are the OpenTelemetry docs but here are some naming conventions, https://opentelemetry.io/docs/reference/specification/common/attribute-naming/

clokep · 2022-09-13T11:59:07Z

docs/usage/configuration/config_documentation.md

+* `request_headers_to_tag`: A list of headers to extract from the request and
+  add to to the top-level servlet tracing span as tags. Useful when you're using
+  a reverse proxy service like Cloudflare to protect your Synapse instance in
+  order to correlate and match up requests that timed out at the Cloudflare
+  layer to the Synapse traces.


Not to throw a wrench in these plans, but if other reverse proxies do something similar; I wonder if it would make more sense to have a header which could be the source for SynapseRequest.request_id (instead of generating a new one):

synapse/synapse/http/site.py

Lines 174 to 175 in 9772e36

def get_request_id(self) -> str:

return "%s-%i" % (self.get_method(), self.request_seq)

Would that be clearer instead of having two separate IDs?

👍 Seems reasonable to me and a little cleaner. I'll create a new PR with this solution ⏩

I don't know if this would make it harder to track things in the logs or anything, but seems like it would be a lot cleaner.

Tried this out in #13801

Losing the simplicity of the incrementing integer to reference in logs as you scroll kinda sucks though. I'm leaning back to this option I think 🤔

Losing the simplicity of the incrementing integer to reference in logs as you scroll kinda sucks though. I'm leaning back to this option I think 🤔

Is that really used though? I usually just copy and paste for grep. Once the number gets beyond a few digits it is useless anyway IMO.

I'm not the one having to log dive and debug big Synapse instances so I don't know. In tests, we will still have the same sequential request ID's since this config won't be defined. We're not changing any default either so it's basically opt-in.

If anyone has a strong desire for the old way, we can always revert as well. If you're happy with #13801, we can move forward with that ⏩

Sounds like we should ask folks real quick then!

Thanks for asking in #synapse-dev, https://matrix.to/#/!vcyiEtMVHIhWXcJAfl:sw1v.org/$kMyUMCOU4fZpH6AOUPOIqGyZADn38ebt2kn2wo9cAXU?via=matrix.org&via=element.io&via=beeper.com ❤️

Discussion being tracked in #13801 (comment)

MadLittleMods · 2022-09-15T00:34:46Z

Closing as it seems we're settling on the alternative #13801

Be able to correlate timeouts in reverse-proxy layer in front of Syna…

2a3f3c7

…pse (tag specific request headers) Fix #13685

MadLittleMods added the T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. label Sep 12, 2022

MadLittleMods added 5 commits September 12, 2022 17:58

Add changelog

7b73d0d

Add tag

e244ee5

Better handling

b52f921

Header name has more context

8aa4be5

Make sure it's a string

1e4f918

MadLittleMods commented Sep 12, 2022

View reviewed changes

MadLittleMods commented Sep 13, 2022

View reviewed changes

synapse/api/auth.py Show resolved Hide resolved

MadLittleMods marked this pull request as ready for review September 13, 2022 01:51

MadLittleMods requested a review from a team as a code owner September 13, 2022 01:51

clokep reviewed Sep 13, 2022

View reviewed changes

MadLittleMods mentioned this pull request Sep 13, 2022

Be able to correlate timeouts in reverse-proxy layer in front of Synapse (pull request ID from header) #13801

Merged

4 tasks

MadLittleMods closed this Sep 15, 2022

	def get_request_id(self) -> str:
	return "%s-%i" % (self.get_method(), self.request_seq)

Uh oh!

Conversation

MadLittleMods commented Sep 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clokep Sep 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MadLittleMods commented Sep 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MadLittleMods commented Sep 12, 2022 •

edited

Loading

clokep Sep 13, 2022 •

edited

Loading