Ensure that pending to-device events are sent over federation at startup#16925
Ensure that pending to-device events are sent over federation at startup#16925
Conversation
When picking destination servers that we need to wake up for a retry, we need to be mindful of destinations that we have *never* successfully sent to. This can manifest either as a null `last_successful_stream_ordering`, or even no row in `destinations` at all. Hence, we need to left-join on `destinations` rather than inner-joining, and we need to treat a null `last_successful_stream_ordering` the same as 0.
When considering which destinations need waking up for a retry, also look for those that have outstanding to-device messages.
We don't really want people to have to wait 60 seconds for their to-device messages.
| run_as_background_process( | ||
| "wake_destinations_needing_catchup", | ||
| self._wake_destinations_needing_catchup, | ||
| ) |
There was a problem hiding this comment.
I think we should add a now param to looping_call (as the underlying twisted thing supports it), that way if the initial run takes a long time a second run won't get started by the looping call.
There was a problem hiding this comment.
Hum. Turns out this screws up the type checker.
In order to add the new parameter without breaking existing code, it needs to be a named parameter (because excess positional parameters are passed to the wrapped function). However, it turns out that ParamSpec is incompatible with additional keyword args (https://peps.python.org/pep-0612/#id2 actually mentions this: "Note that this also why we have to reject signatures of the form (*args: P.args, s: str, **kwargs: P.kwargs).)
So, three options:
- Add a positional parameter to
looping_call, and update all existing calls tolooping_call(55 of them, according to my IDE). - Add a new method
looping_call2(or something) which takes a positionalnowparameter. Use it here, and deprecatelooping_call. - Add a new method
looping_call_now, which is exactly the same aslooping_callexcept for the obvious. (The implementation of this will probably involve a private equivalent tolooping_call2, shared betweenlooping_call_nowandlooping_call).
My instinct is number 3, but happy with whatever you think.
There was a problem hiding this comment.
Well, I took an executive decision.
|
The tests are now failing due to db connection pool shenanigans. #17017 should fix it. |
# Synapse 1.104.0 (2024-04-02) ### Bugfixes - Fix regression when using OIDC provider. Introduced in v1.104.0rc1. ([\#17031](element-hq/synapse#17031)) # Synapse 1.104.0rc1 (2024-03-26) ### Features - Add an OIDC config to specify extra parameters for the authorization grant URL. IT can be useful to pass an ACR value for example. ([\#16971](element-hq/synapse#16971)) - Add support for OIDC provider returning JWT. ([\#16972](element-hq/synapse#16972), [\#17031](element-hq/synapse#17031)) ### Bugfixes - Fix a bug which meant that, under certain circumstances, we might never retry sending events or to-device messages over federation after a failure. ([\#16925](element-hq/synapse#16925)) - Fix various long-standing bugs which could cause incorrect state to be returned from `/sync` in certain situations. ([\#16949](element-hq/synapse#16949)) - Fix case in which `m.fully_read` marker would not get updated. Contributed by @SpiritCroc. ([\#16990](element-hq/synapse#16990)) - Fix bug which did not retract a user's pending knocks at rooms when their account was deactivated. Contributed by @hanadi92. ([\#17010](element-hq/synapse#17010)) ### Updates to the Docker image - Updated `start.py` to generate config using the correct user ID when running as root (fixes [\#16824](element-hq/synapse#16824), [\#15202](element-hq/synapse#15202)). ([\#16978](element-hq/synapse#16978)) ### Improved Documentation - Add a query to force a refresh of a remote user's device list to the "Useful SQL for Admins" documentation page. ([\#16892](element-hq/synapse#16892)) - Minor grammatical corrections to the upgrade documentation. ([\#16965](element-hq/synapse#16965)) - Fix the sort order for the documentation version picker, so that newer releases appear above older ones. ([\#16966](element-hq/synapse#16966)) - Remove recommendation for a specific poetry version from contributing guide. ([\#17002](element-hq/synapse#17002)) ### Internal Changes - Improve lock performance when a lot of locks are all waiting for a single lock to be released. ([\#16840](element-hq/synapse#16840)) - Update power level default for public rooms. ([\#16907](element-hq/synapse#16907)) - Improve event validation. ([\#16908](element-hq/synapse#16908)) - Multi-worker-docker-container: disable log buffering. ([\#16919](element-hq/synapse#16919)) - Refactor state delta calculation in `/sync` handler. ([\#16929](element-hq/synapse#16929)) - Clarify docs for some room state functions. ([\#16950](element-hq/synapse#16950)) - Specify IP subnets in canonical form. ([\#16953](element-hq/synapse#16953)) - As done for SAML mapping provider, let's pass the module API to the OIDC one so the mapper can do more logic in its code. ([\#16974](element-hq/synapse#16974)) - Allow containers building on top of Synapse's Complement container is use the included PostgreSQL cluster. ([\#16985](element-hq/synapse#16985)) - Raise poetry-core version cap to 1.9.0. ([\#16986](element-hq/synapse#16986)) - Patch the db conn pool sooner in tests. ([\#17017](element-hq/synapse#17017)) ### Updates to locked dependencies * Bump anyhow from 1.0.80 to 1.0.81. ([\#17009](element-hq/synapse#17009)) * Bump black from 23.10.1 to 24.2.0. ([\#16936](element-hq/synapse#16936)) * Bump cryptography from 41.0.7 to 42.0.5. ([\#16958](element-hq/synapse#16958)) * Bump dawidd6/action-download-artifact from 3.1.1 to 3.1.2. ([\#16960](element-hq/synapse#16960)) * Bump dawidd6/action-download-artifact from 3.1.2 to 3.1.4. ([\#17008](element-hq/synapse#17008)) * Bump jinja2 from 3.1.2 to 3.1.3. ([\#17005](element-hq/synapse#17005)) * Bump log from 0.4.20 to 0.4.21. ([\#16977](element-hq/synapse#16977)) * Bump mypy from 1.5.1 to 1.8.0. ([\#16901](element-hq/synapse#16901)) * Bump netaddr from 0.9.0 to 1.2.1. ([\#17006](element-hq/synapse#17006)) * Bump pydantic from 2.6.0 to 2.6.4. ([\#17004](element-hq/synapse#17004)) * Bump pyo3 from 0.20.2 to 0.20.3. ([\#16962](element-hq/synapse#16962)) * Bump ruff from 0.1.14 to 0.3.2. ([\#16994](element-hq/synapse#16994)) * Bump serde from 1.0.196 to 1.0.197. ([\#16963](element-hq/synapse#16963)) * Bump serde_json from 1.0.113 to 1.0.114. ([\#16961](element-hq/synapse#16961)) * Bump types-jsonschema from 4.21.0.20240118 to 4.21.0.20240311. ([\#17007](element-hq/synapse#17007)) * Bump types-psycopg2 from 2.9.21.16 to 2.9.21.20240311. ([\#16995](element-hq/synapse#16995)) * Bump types-pyopenssl from 23.3.0.0 to 24.0.0.20240311. ([\#17003](element-hq/synapse#17003)) # Synapse 1.103.0 (2024-03-19) No significant changes since 1.103.0rc1. # Synapse 1.103.0rc1 (2024-03-12) ### Features - Add a new [List Accounts v3](https://element-hq.github.io/synapse/v1.103/admin_api/user_admin_api.html#list-accounts-v3) Admin API with improved deactivated user filtering capabilities. ([\#16874](element-hq/synapse#16874)) - Include `Retry-After` header by default per [MSC4041](matrix-org/matrix-spec-proposals#4041). Contributed by @clokep. ([\#16947](element-hq/synapse#16947)) ### Bugfixes - Fix joining remote rooms when a module uses the `on_new_event` callback. This callback may now pass partial state events instead of the full state for remote rooms. Introduced in v1.76.0. ([\#16973](element-hq/synapse#16973)) - Fix performance issue when joining very large rooms that can cause the server to lock up. Introduced in v1.100.0. Contributed by @ggogel. ([\#16968](element-hq/synapse#16968)) ### Improved Documentation - Add HAProxy example for single port operation to reverse proxy documentation. Contributed by Georg Pfuetzenreuter (@tacerus). ([\#16768](element-hq/synapse#16768)) - Improve the documentation around running Complement tests with new configuration parameters. ([\#16946](element-hq/synapse#16946)) - Add docs on upgrading from a very old version. ([\#16951](element-hq/synapse#16951)) ### Updates to locked dependencies * Bump JasonEtco/create-an-issue from 2.9.1 to 2.9.2. ([\#16934](element-hq/synapse#16934)) * Bump anyhow from 1.0.79 to 1.0.80. ([\#16935](element-hq/synapse#16935)) * Bump dawidd6/action-download-artifact from 3.0.0 to 3.1.1. ([\#16933](element-hq/synapse#16933)) * Bump furo from 2023.9.10 to 2024.1.29. ([\#16939](element-hq/synapse#16939)) * Bump pyopenssl from 23.3.0 to 24.0.0. ([\#16937](element-hq/synapse#16937)) * Bump types-netaddr from 0.10.0.20240106 to 1.2.0.20240219. ([\#16938](element-hq/synapse#16938))
- Fix regression when using OIDC provider. Introduced in v1.104.0rc1. ([\#17031](element-hq/synapse#17031)) - Add an OIDC config to specify extra parameters for the authorization grant URL. IT can be useful to pass an ACR value for example. ([\#16971](element-hq/synapse#16971)) - Add support for OIDC provider returning JWT. ([\#16972](element-hq/synapse#16972), [\#17031](element-hq/synapse#17031)) - Fix a bug which meant that, under certain circumstances, we might never retry sending events or to-device messages over federation after a failure. ([\#16925](element-hq/synapse#16925)) - Fix various long-standing bugs which could cause incorrect state to be returned from `/sync` in certain situations. ([\#16949](element-hq/synapse#16949)) - Fix case in which `m.fully_read` marker would not get updated. Contributed by @SpiritCroc. ([\#16990](element-hq/synapse#16990)) - Fix bug which did not retract a user's pending knocks at rooms when their account was deactivated. Contributed by @hanadi92. ([\#17010](element-hq/synapse#17010)) - Updated `start.py` to generate config using the correct user ID when running as root (fixes [\#16824](element-hq/synapse#16824), [\matrix-org#15202](element-hq/synapse#15202)). ([\#16978](element-hq/synapse#16978)) - Add a query to force a refresh of a remote user's device list to the "Useful SQL for Admins" documentation page. ([\#16892](element-hq/synapse#16892)) - Minor grammatical corrections to the upgrade documentation. ([\#16965](element-hq/synapse#16965)) - Fix the sort order for the documentation version picker, so that newer releases appear above older ones. ([\#16966](element-hq/synapse#16966)) - Remove recommendation for a specific poetry version from contributing guide. ([\#17002](element-hq/synapse#17002)) - Improve lock performance when a lot of locks are all waiting for a single lock to be released. ([\#16840](element-hq/synapse#16840)) - Update power level default for public rooms. ([\#16907](element-hq/synapse#16907)) - Improve event validation. ([\#16908](element-hq/synapse#16908)) - Multi-worker-docker-container: disable log buffering. ([\#16919](element-hq/synapse#16919)) - Refactor state delta calculation in `/sync` handler. ([\#16929](element-hq/synapse#16929)) - Clarify docs for some room state functions. ([\#16950](element-hq/synapse#16950)) - Specify IP subnets in canonical form. ([\#16953](element-hq/synapse#16953)) - As done for SAML mapping provider, let's pass the module API to the OIDC one so the mapper can do more logic in its code. ([\#16974](element-hq/synapse#16974)) - Allow containers building on top of Synapse's Complement container is use the included PostgreSQL cluster. ([\#16985](element-hq/synapse#16985)) - Raise poetry-core version cap to 1.9.0. ([\#16986](element-hq/synapse#16986)) - Patch the db conn pool sooner in tests. ([\#17017](element-hq/synapse#17017)) * Bump anyhow from 1.0.80 to 1.0.81. ([\#17009](element-hq/synapse#17009)) * Bump black from 23.10.1 to 24.2.0. ([\#16936](element-hq/synapse#16936)) * Bump cryptography from 41.0.7 to 42.0.5. ([\#16958](element-hq/synapse#16958)) * Bump dawidd6/action-download-artifact from 3.1.1 to 3.1.2. ([\#16960](element-hq/synapse#16960)) * Bump dawidd6/action-download-artifact from 3.1.2 to 3.1.4. ([\#17008](element-hq/synapse#17008)) * Bump jinja2 from 3.1.2 to 3.1.3. ([\#17005](element-hq/synapse#17005)) * Bump log from 0.4.20 to 0.4.21. ([\#16977](element-hq/synapse#16977)) * Bump mypy from 1.5.1 to 1.8.0. ([\#16901](element-hq/synapse#16901)) * Bump netaddr from 0.9.0 to 1.2.1. ([\#17006](element-hq/synapse#17006)) * Bump pydantic from 2.6.0 to 2.6.4. ([\#17004](element-hq/synapse#17004)) * Bump pyo3 from 0.20.2 to 0.20.3. ([\#16962](element-hq/synapse#16962)) * Bump ruff from 0.1.14 to 0.3.2. ([\#16994](element-hq/synapse#16994)) * Bump serde from 1.0.196 to 1.0.197. ([\#16963](element-hq/synapse#16963)) * Bump serde_json from 1.0.113 to 1.0.114. ([\#16961](element-hq/synapse#16961)) * Bump types-jsonschema from 4.21.0.20240118 to 4.21.0.20240311. ([\#17007](element-hq/synapse#17007)) * Bump types-psycopg2 from 2.9.21.16 to 2.9.21.20240311. ([\#16995](element-hq/synapse#16995)) * Bump types-pyopenssl from 23.3.0.0 to 24.0.0.20240311. ([\#17003](element-hq/synapse#17003))
Fixes #16680, as well as a related bug, where servers which we had never successfully sent an event to would not be retried.
In order to fix the case of pending to-device messages, we hook into the existing
wake_destinations_needing_catchupprocess, by extending it to look for destinations that have pending to-device messages. The federation transmission loop then attempts to send the pending to-device messages as normal.Suggest review commit-by-commit.