-
Notifications
You must be signed in to change notification settings - Fork 45
Fix remote_insert_lsn lost after crash recovery #332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
mason-sharp
merged 1 commit into
main
from
task/SPOC-421/fix-remote-insert-lsn-crash-recovery
Feb 5, 2026
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,117 @@ | ||
| #!/usr/bin/perl | ||
| # ============================================================================= | ||
| # Test: 016_crash_recovery_progress.pl - Verify progress survives crash recovery | ||
| # ============================================================================= | ||
| # This test verifies that spock.progress.remote_insert_lsn is correctly | ||
| # persisted to WAL and survives crash recovery. | ||
| # | ||
| # Topology: | ||
| # n1 (provider) -> n2 (subscriber) | ||
| # | ||
| # Test scenario: | ||
| # 1. Create subscription, insert data, verify remote_insert_lsn > 0 | ||
| # 2. Crash n2 (immediate stop - no clean shutdown) | ||
| # 3. Restart n2 and verify remote_insert_lsn is still > 0 | ||
| # | ||
| # Without the fix, remote_insert_lsn would be 0 after crash recovery. | ||
| # ============================================================================= | ||
|
|
||
| use strict; | ||
| use warnings; | ||
| use Test::More tests => 13; | ||
| use lib '.'; | ||
| use SpockTest qw(create_cluster destroy_cluster system_or_bail system_maybe | ||
| command_ok get_test_config scalar_query psql_or_bail); | ||
|
|
||
| # ============================================================================= | ||
| # SETUP: Create 2-node cluster | ||
| # ============================================================================= | ||
|
|
||
| create_cluster(2, 'Create 2-node cluster'); | ||
|
|
||
| my $config = get_test_config(); | ||
| my $node_ports = $config->{node_ports}; | ||
| my $node_datadirs = $config->{node_datadirs}; | ||
| my $pg_bin = $config->{pg_bin}; | ||
| my $dbname = $config->{db_name}; | ||
| my $host = $config->{host}; | ||
|
|
||
| # Connection string for n1 | ||
| my $conn_n1 = "host=$host port=$node_ports->[0] dbname=$dbname"; | ||
|
|
||
| # ============================================================================= | ||
| # TEST: Setup replication and verify progress | ||
| # ============================================================================= | ||
|
|
||
| # Create test table on both nodes (auto-added to default repset via DDL replication) | ||
| psql_or_bail(1, "CREATE TABLE test_progress (id serial primary key, val text)"); | ||
| psql_or_bail(2, "CREATE TABLE test_progress (id serial primary key, val text)"); | ||
| pass('Created test table on both nodes'); | ||
|
|
||
| # Create subscription on n2 to n1 | ||
| psql_or_bail(2, "SELECT spock.sub_create('sub_n1_n2', '$conn_n1', ARRAY['default'], false, false)"); | ||
| pass('Created subscription n2->n1'); | ||
|
|
||
| # Wait for subscription to be ready | ||
| system_or_bail 'sleep', '5'; | ||
|
|
||
| my $sub_status = scalar_query(2, "SELECT 1 FROM spock.sub_show_status() WHERE subscription_name = 'sub_n1_n2' AND status = 'replicating'"); | ||
| is($sub_status, '1', 'Subscription is replicating'); | ||
|
|
||
| # Insert data on n1 | ||
| psql_or_bail(1, "INSERT INTO test_progress (val) SELECT 'row_' || g FROM generate_series(1, 100) g"); | ||
| system_or_bail 'sleep', '3'; | ||
|
|
||
| # Verify data reached n2 | ||
| my $count_n2 = scalar_query(2, "SELECT COUNT(*) FROM test_progress"); | ||
| is($count_n2, '100', 'Data replicated to n2'); | ||
|
|
||
| # ============================================================================= | ||
| # KEY TEST: Verify remote_insert_lsn before crash | ||
| # ============================================================================= | ||
|
|
||
| my $insert_lsn_before = scalar_query(2, "SELECT remote_insert_lsn FROM spock.progress WHERE remote_node_id = (SELECT node_id FROM spock.node WHERE node_name = 'n1')"); | ||
| diag("remote_insert_lsn before crash: $insert_lsn_before"); | ||
| ok($insert_lsn_before ne '0/0' && $insert_lsn_before ne '', 'remote_insert_lsn is valid before crash'); | ||
|
|
||
| # ============================================================================= | ||
| # CRASH: Kill n2 with SIGKILL (simulates crash - no cleanup, no resource.dat) | ||
| # ============================================================================= | ||
|
|
||
| # Get postmaster PID | ||
| my $pid_file = "$node_datadirs->[1]/postmaster.pid"; | ||
| open(my $fh, '<', $pid_file) or die "Cannot open $pid_file: $!"; | ||
| my $postmaster_pid = <$fh>; | ||
| chomp($postmaster_pid); | ||
| close($fh); | ||
|
|
||
| diag("Killing n2 (PID $postmaster_pid) with SIGKILL..."); | ||
| kill 'KILL', $postmaster_pid; | ||
| system_or_bail 'sleep', '2'; | ||
|
|
||
| # ============================================================================= | ||
| # RECOVERY: Restart n2 | ||
| # ============================================================================= | ||
|
|
||
| diag("Restarting n2..."); | ||
| system_or_bail "$pg_bin/pg_ctl", 'start', '-D', $node_datadirs->[1], '-l', "$node_datadirs->[1]/logfile", '-w'; | ||
| system_or_bail 'sleep', '3'; | ||
|
|
||
| # ============================================================================= | ||
| # VERIFY: remote_insert_lsn should survive crash recovery | ||
| # ============================================================================= | ||
|
|
||
| my $insert_lsn_after = scalar_query(2, "SELECT remote_insert_lsn FROM spock.progress WHERE remote_node_id = (SELECT node_id FROM spock.node WHERE node_name = 'n1')"); | ||
| diag("remote_insert_lsn after recovery: $insert_lsn_after"); | ||
|
|
||
| # The key assertion: remote_insert_lsn should NOT be 0 after crash recovery | ||
| ok($insert_lsn_after ne '0/0' && $insert_lsn_after ne '', 'remote_insert_lsn survives crash recovery'); | ||
|
|
||
| # Verify it's the same or close to the value before crash | ||
| is($insert_lsn_after, $insert_lsn_before, 'remote_insert_lsn matches value before crash'); | ||
|
|
||
| # ============================================================================= | ||
| # CLEANUP | ||
| # ============================================================================= | ||
|
|
||
| destroy_cluster('Cleanup'); | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.