MSC3885: Sliding Sync Extension: To-Device messages#3885
MSC3885: Sliding Sync Extension: To-Device messages#3885
Conversation
| The alternative is to include to-device events like normal events in a different section, without | ||
| making use of dedicated `since` and `next_batch` tokens, instead relying on the `pos` value. This | ||
| would revert to-device events to be implicitly acknowledged, which has caused numerous [issues](https://github.com/vector-im/element-ios/issues/3817) in | ||
| the past. |
There was a problem hiding this comment.
I'm not sure I understand how this setup differs, in a meaningful way, with regards to the linked issue. Not that the issue defines the problem completely. Matter of fact the following definition sounds exactly the same as for sync v2:
The
eventsare treated as "acknowledged" when the server receives a new request with thesincevalue set to the previous response'snext_batchvalue. When this occurs, acknowledged events are permanently deleted from the server, and MUST NOT be returned to the client should another request with an oldersincevalue be sent.
We're still going to have two processes that will try to fetch the to-device events. One of them might advance further then the other, resulting in the events being acknowledged and thus removed from the server.
If we have a bunch of tokens denoted by Tₙ. Where n is used to indicate the order in our sync token sequence. Once we use the token Tₙ₊₁ for a sync request, to-device events, that were part of the previous Tₙ sync response, will be deleted from the server.
This means if process A uses token Tₙ₊₁ before process B manages to use token Tₙ, then process B will miss all events that were part of the sync response for which token Tₙ was used by process A. This leads to a discrepancy in state of cryptographic objects of process A and process B. In the linked issue process B overwrites the newer state inserted by process A with an older and incomplete version of it.
Doesn't this scenario happen with this MSC as well?
There was a problem hiding this comment.
Yes of course, but that has never been the point it has been trying to solve.
This MSC solves the problem by allowing a process to choose to opt in to to-device messages, and explicitly decide when to acknowledge said messages whilst still getting other unrelated events. Processes will need to agree which one will be in charge of processing them. You cannot safely have multiple sync streams at all in Sliding Sync.
I consulted the iOS folks when designing this MSC and this was all they needed from their end when I asked.
| The client MUST persist the `next_batch` value to persistent storage between requests in case the client is | ||
| killed by the OS. |
There was a problem hiding this comment.
Why MUST? In the worst case the server will just send the last 100 or so to_device messages again, wouldn't it?
There was a problem hiding this comment.
Server-side, yes this is the worst case.
Cilent-side it's more pernicious, as the client may have processed some but not all the to-device events before being killed, resulting in processing duplicate to-device events, the effects of which will depend on the kind of event.
There was a problem hiding this comment.
Sounds more like it is the clients fault then, if it gets that wrong. I was a bit surprised to see that called out as explicitly, but I guess that is fine.
| { | ||
| "enabled": true, // sticky | ||
| "limit": 100, // sticky, max number of events to return, server can override | ||
| "since": "some_token" // optional, can be omitted on initial sync / when extension is enabled |
There was a problem hiding this comment.
Uh?
| "since": "some_token" // optional, can be omitted on initial sync / when extension is enabled | |
| "since": "some_token" // optional, can be omitted on initial sync / when extension is disabled |
There was a problem hiding this comment.
probably means when the extension is just enabled
Based on: - MSC3575: Sliding Sync (aka Sync v3): matrix-org/matrix-spec-proposals#3575 - MSC3885: Sliding Sync Extension: To-Device messages: matrix-org/matrix-spec-proposals#3885 - MSC3884: Sliding Sync Extension: E2EE: matrix-org/matrix-spec-proposals#3884
This is being introduced as part of Sliding Sync but doesn't have any sliding window component. It's just a way to get E2EE events without having to sit through a big initial sync (`/sync` v2). And we can avoid encryption events being backed up by the main sync response or vice-versa. Part of some Sliding Sync simplification/experimentation. See [this discussion](#17167 (comment)) for why it may not be as useful as we thought. Based on: - matrix-org/matrix-spec-proposals#3575 - matrix-org/matrix-spec-proposals#3885 - matrix-org/matrix-spec-proposals#3884
…t-hq#17167) This is being introduced as part of Sliding Sync but doesn't have any sliding window component. It's just a way to get E2EE events without having to sit through a big initial sync (`/sync` v2). And we can avoid encryption events being backed up by the main sync response or vice-versa. Part of some Sliding Sync simplification/experimentation. See [this discussion](element-hq#17167 (comment)) for why it may not be as useful as we thought. Based on: - matrix-org/matrix-spec-proposals#3575 - matrix-org/matrix-spec-proposals#3885 - matrix-org/matrix-spec-proposals#3884
…t-hq#17167) This is being introduced as part of Sliding Sync but doesn't have any sliding window component. It's just a way to get E2EE events without having to sit through a big initial sync (`/sync` v2). And we can avoid encryption events being backed up by the main sync response or vice-versa. Part of some Sliding Sync simplification/experimentation. See [this discussion](element-hq#17167 (comment)) for why it may not be as useful as we thought. Based on: - matrix-org/matrix-spec-proposals#3575 - matrix-org/matrix-spec-proposals#3885 - matrix-org/matrix-spec-proposals#3884
| { | ||
| "enabled": true, // sticky | ||
| "limit": 100, // sticky, max number of events to return, server can override | ||
| "since": "some_token" // optional, can be omitted on initial sync / when extension is enabled |
There was a problem hiding this comment.
We could get away with not having a separate token.
For example, in Synapse, the pos sync token is a vector clock and already has the to-device stream position in it. Since the Sliding Sync extension interface allows selectively enabling an extension, you can choose to acknowledge to-device messages by whether you enable the to_device extension ("enabled": true).
There was a problem hiding this comment.
We have since discussed some times in Synapse that stopping the vector clock partially from advancing is one of the (anti-)patterns that made oldschool sync brittle and hard to reason about, which may explain why the extra token was added to this MSC.
| the `to_device` extension response: | ||
| ```js | ||
| { | ||
| "next_batch": "some_token", // REQUIRED: even if no changes |
There was a problem hiding this comment.
concern: this being REQUIRED even with no changes seems like a mistake.
The client can surely be trusted to remember what it sent in since and we could have it just preserve that.
By requiring this field, we also have to always serialise the response extension, even when nothing happens. This is maybe fine in isolation, but it sets a bloaty pattern as other extensions will want to be consistent with this pattern.
|
|
||
| ## Dependencies | ||
|
|
||
| This MSC builds on MSC3575, which at the time of writing has not yet been accepted into the spec. |
There was a problem hiding this comment.
Despite being in production, this MSC still depends on an obsolete MSC. This MSC ought to be updated to meet MSC4186.
Indeed, some of the 'sticky' aspects don't appear to make sense anymore and don't match the production implementation in Synapse.
Rendered
Client Implementation
Server Implementation