Skip to content

POC: MZ reading from physical standby (DNM)#36923

Draft
martykulma wants to merge 1 commit into
MaterializeInc:mainfrom
martykulma:maz-pg-cascading-repl
Draft

POC: MZ reading from physical standby (DNM)#36923
martykulma wants to merge 1 commit into
MaterializeInc:mainfrom
martykulma:maz-pg-cascading-repl

Conversation

@martykulma

Copy link
Copy Markdown
Contributor

A proof of concept of MZ reading from a physical standby (PG 16 added logical decoding from physical standby). This doesn't account for what happens on a primary/secondary failover (my understanding is that the replication slot will be lost and the PG timeline will change, so the source would need to be recreated in MZ).

peterdukelarsen added a commit that referenced this pull request Jun 18, 2026
…#37020)

### Motivation
We would like to be able to set up replication against a physical
Postgres replica. This fails hard today because you can't read
pg_current_wal_lsn from a standby.

Relevant issue:
https://linear.app/materializeinc/issue/SS-187/replication-from-postgres-replica

This is adapted from
#36923.

### Description
- Record during purification whether the connection is to a physical
replica
- Use the appropriate method of loading the LSN depending on whether
it's a physical replica or not

### Tradeoffs to call out
- Replicating from a primary without `hot_standby_feedback=on` on the
replica risks snapshots timing out (potentially quickly) and increases
the risk of the replication slot being terminated.
- In the test case where we stall, we'll hold a slot open on the
replica. With `hot_standby_feedback=on` this risks bloating the disk of
the primary. I think setting max_slot_wal_keep_size to a reasonably high
level mitigates the scariest versions of this. This style of tradeoff
seems somewhat similar to existing trade-offs around
max_slot_wall_keep_size settings.
- RTR is no longer accurate relative to the primary postgres replica.
This should be ok and similar to cascading logical replication which we
support today.
- Materialize sources will require a fresh snapshot in the case of
failover -- this is similar to logical replication today where a
timeline id change produces an error. This protects us from corruption.

### Verification
- Testdrive tests covering reading from a replica, basic RTR queries,
and ensuring we stall if the same replica is recovered from a more
recent backup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant