Skip to content

MSC3885: Sliding Sync Extension: To-Device messages#3885

Open
kegsay wants to merge 3 commits intomainfrom
kegan/msc3885
Open

MSC3885: Sliding Sync Extension: To-Device messages#3885
kegsay wants to merge 3 commits intomainfrom
kegan/msc3885

Conversation

@kegsay
Copy link
Member

@kegsay kegsay commented Sep 7, 2022

@turt2live turt2live added proposal A matrix spec change proposal client-server Client-Server API kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. and removed needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Sep 7, 2022
The alternative is to include to-device events like normal events in a different section, without
making use of dedicated `since` and `next_batch` tokens, instead relying on the `pos` value. This
would revert to-device events to be implicitly acknowledged, which has caused numerous [issues](https://github.com/vector-im/element-ios/issues/3817) in
the past.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand how this setup differs, in a meaningful way, with regards to the linked issue. Not that the issue defines the problem completely. Matter of fact the following definition sounds exactly the same as for sync v2:

The events are treated as "acknowledged" when the server receives a new request with the since value set to the previous response's next_batch value. When this occurs, acknowledged events are permanently deleted from the server, and MUST NOT be returned to the client should another request with an older since value be sent.

We're still going to have two processes that will try to fetch the to-device events. One of them might advance further then the other, resulting in the events being acknowledged and thus removed from the server.

If we have a bunch of tokens denoted by Tₙ. Where n is used to indicate the order in our sync token sequence. Once we use the token Tₙ₊₁ for a sync request, to-device events, that were part of the previous Tₙ sync response, will be deleted from the server.

This means if process A uses token Tₙ₊₁ before process B manages to use token Tₙ, then process B will miss all events that were part of the sync response for which token Tₙ was used by process A. This leads to a discrepancy in state of cryptographic objects of process A and process B. In the linked issue process B overwrites the newer state inserted by process A with an older and incomplete version of it.

Doesn't this scenario happen with this MSC as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course, but that has never been the point it has been trying to solve.

This MSC solves the problem by allowing a process to choose to opt in to to-device messages, and explicitly decide when to acknowledge said messages whilst still getting other unrelated events. Processes will need to agree which one will be in charge of processing them. You cannot safely have multiple sync streams at all in Sliding Sync.

I consulted the iOS folks when designing this MSC and this was all they needed from their end when I asked.

Comment on lines +37 to +38
The client MUST persist the `next_batch` value to persistent storage between requests in case the client is
killed by the OS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why MUST? In the worst case the server will just send the last 100 or so to_device messages again, wouldn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server-side, yes this is the worst case.

Cilent-side it's more pernicious, as the client may have processed some but not all the to-device events before being killed, resulting in processing duplicate to-device events, the effects of which will depend on the kind of event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds more like it is the clients fault then, if it gets that wrong. I was a bit surprised to see that called out as explicitly, but I guess that is fine.

@turt2live turt2live added the matrix-2.0 Required for Matrix 2.0 label Feb 27, 2023
@turt2live turt2live requested review from a team and turt2live February 28, 2023 02:38
{
"enabled": true, // sticky
"limit": 100, // sticky, max number of events to return, server can override
"since": "some_token" // optional, can be omitted on initial sync / when extension is enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh?

Suggested change
"since": "some_token" // optional, can be omitted on initial sync / when extension is enabled
"since": "some_token" // optional, can be omitted on initial sync / when extension is disabled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably means when the extension is just enabled

@turt2live turt2live removed request for a team and turt2live June 6, 2023 19:48
MadLittleMods added a commit to element-hq/synapse that referenced this pull request May 7, 2024
Based on:

 - MSC3575: Sliding Sync (aka Sync v3): matrix-org/matrix-spec-proposals#3575
 - MSC3885: Sliding Sync Extension: To-Device messages: matrix-org/matrix-spec-proposals#3885
 - MSC3884: Sliding Sync Extension: E2EE: matrix-org/matrix-spec-proposals#3884
MadLittleMods added a commit to element-hq/synapse that referenced this pull request May 23, 2024
This is being introduced as part of Sliding Sync but doesn't have any sliding window component. It's just a way to get E2EE events without having to sit through a big initial sync  (`/sync` v2). And we can avoid encryption events being backed up by the main sync response or vice-versa.

Part of some Sliding Sync simplification/experimentation. See [this discussion](#17167 (comment)) for why it may not be as useful as we thought.

Based on:

 - matrix-org/matrix-spec-proposals#3575
 - matrix-org/matrix-spec-proposals#3885
 - matrix-org/matrix-spec-proposals#3884
H-Shay pushed a commit to H-Shay/hq_synapse that referenced this pull request May 31, 2024
…t-hq#17167)

This is being introduced as part of Sliding Sync but doesn't have any sliding window component. It's just a way to get E2EE events without having to sit through a big initial sync  (`/sync` v2). And we can avoid encryption events being backed up by the main sync response or vice-versa.

Part of some Sliding Sync simplification/experimentation. See [this discussion](element-hq#17167 (comment)) for why it may not be as useful as we thought.

Based on:

 - matrix-org/matrix-spec-proposals#3575
 - matrix-org/matrix-spec-proposals#3885
 - matrix-org/matrix-spec-proposals#3884
Mic92 pushed a commit to Mic92/synapse that referenced this pull request Jun 14, 2024
…t-hq#17167)

This is being introduced as part of Sliding Sync but doesn't have any sliding window component. It's just a way to get E2EE events without having to sit through a big initial sync  (`/sync` v2). And we can avoid encryption events being backed up by the main sync response or vice-versa.

Part of some Sliding Sync simplification/experimentation. See [this discussion](element-hq#17167 (comment)) for why it may not be as useful as we thought.

Based on:

 - matrix-org/matrix-spec-proposals#3575
 - matrix-org/matrix-spec-proposals#3885
 - matrix-org/matrix-spec-proposals#3884
{
"enabled": true, // sticky
"limit": 100, // sticky, max number of events to return, server can override
"since": "some_token" // optional, can be omitted on initial sync / when extension is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could get away with not having a separate token.

For example, in Synapse, the pos sync token is a vector clock and already has the to-device stream position in it. Since the Sliding Sync extension interface allows selectively enabling an extension, you can choose to acknowledge to-device messages by whether you enable the to_device extension ("enabled": true).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have since discussed some times in Synapse that stopping the vector clock partially from advancing is one of the (anti-)patterns that made oldschool sync brittle and hard to reason about, which may explain why the extra token was added to this MSC.

@turt2live turt2live removed the matrix-2.0 Required for Matrix 2.0 label Jul 8, 2025
@richvdh richvdh added the matrix-2.0 Required for Matrix 2.0 label Nov 27, 2025
@github-project-automation github-project-automation bot moved this to Tracking for review in Spec Core Team Workflow Nov 27, 2025
the `to_device` extension response:
```js
{
"next_batch": "some_token", // REQUIRED: even if no changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concern: this being REQUIRED even with no changes seems like a mistake.
The client can surely be trusted to remember what it sent in since and we could have it just preserve that.

By requiring this field, we also have to always serialise the response extension, even when nothing happens. This is maybe fine in isolation, but it sets a bloaty pattern as other extensions will want to be consistent with this pattern.


## Dependencies

This MSC builds on MSC3575, which at the time of writing has not yet been accepted into the spec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite being in production, this MSC still depends on an obsolete MSC. This MSC ought to be updated to meet MSC4186.

Indeed, some of the 'sticky' aspects don't appear to make sense anymore and don't match the production implementation in Synapse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client-server Client-Server API kind:core MSC which is critical to the protocol's success matrix-2.0 Required for Matrix 2.0 proposal A matrix spec change proposal

Projects

Status: Tracking for review

Development

Successfully merging this pull request may close these issues.

8 participants