Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/spockbench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ jobs:
echo "TAP_CT_NAME=$TAP_CT_NAME" >> "$GITHUB_ENV"
docker run --name "$TAP_CT_NAME" -e PGVER=${{ matrix.pgver }} --workdir=/home/pgedge/spock/tests/tap \
spock /home/pgedge/spock/tests/tap/run_tests.sh
timeout-minutes: 15
timeout-minutes: 20

- name: Collect TAP artifacts (from container)
if: ${{ always() }}
Expand Down
33 changes: 33 additions & 0 deletions docs/spock_release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,39 @@ Now no restriction exists. Spock will use memory until memory is exhausted (impr
- rename last_received_lsn with commit_lsn to more precisely identify the underlying value.
- introduce received_lsn - points to the last LSN, sent by the publisher, exactly like pgoutput protocol do.
- remote_insert_lsn reported more frequently, on each incoming WAL record, not only on a COMMIT, as it was before.


## Spock 5.0.6

### New Features
* New `spock.feedback_frequency` GUC that controls how often feedback is sent to the WAL sender. Feedback is sent every *n* messages, where *n* is the configured value. Note that feedback is also sent every wal_sender_feedback / 2 seconds.
* New `spock.log_origin_change` GUC to control logging of row origin changes to the PostgreSQL log. Origin changes caused by replication are no longer written to the `spock.resolutions` table, as they are informational and not true conflicts. Three modes are available:
* `none` — Do not log origin changes (default)
* `remote_only_differs` — Log only when a row from one remote publisher is updated by a different remote publisher
* `since_sub_creation` — Log origin changes for tuples modified after subscription creation (suppresses noise from pg_restored data)
* New `sub_created_at` column on `spock.subscription` to help distinguish pre-existing data (e.g. from pg_restore) from post-subscription data.
* COPY TO is considered read-only and can now be run when a node is in read-only mode.

### Performance Improvements
* Deferred `spock.progress` catalog writes. The progress table was previously updated on every committed transaction and every keepalive, causing significant table bloat and I/O overhead. Progress catalog writes are now batched and flushed at most once per second from the main apply loop. Shared memory is still updated immediately internally for correctness.

### Bug Fixes
* Fix initdb assertion failure in attoptions patch for PG15/16/17.
* Fix two bugs in the table re-sync routine: WAL sending is now switched off during truncate and re-sync to prevent data loss, and the infinite wait was fixed when no more DML is committed by using the last committed LSN instead of the last received LSN.
* Use NULL for unknown `local_origin` in `spock.resolutions` instead of an invalid origin ID when origin cannot be determined (e.g. pg_dump, frozen transactions, truncated commit timestamps). Also fixed off-by-one errors in `spock_conflict_row_to_json()` that were overwriting the `local_origin` NULL flag.
* Fix Z0DAN initialization issue: `present_final_cluster_state` now executes a COMMIT to allow newly created subscriptions to update their state, and final cluster state now checks all subscriptions across the cluster.
* Suppress hot_standby_feedback off error messages in log in case a read replica is used for reporting and not failover (it is required for failover).
* Fix bug when applying changes to a table that has been dropped.

### Operational Improvements
* `add_node` is now restricted to run only on a new (uninitialized) node, preventing accidental misuse.

## Spock 5.0.5 on Feb 12, 2026

* Fix segfault that occurs when using new Postgres minor releases like 18.2.
* Zero Downtime Add Node (Zodan) minor bug fixes and improvements
* Updated documentation

## Spock 5.0.4 on Oct 8, 2025

* Reduce memory usage for transactions with many inserts.
Expand Down
20 changes: 20 additions & 0 deletions src/spock_apply.c
Original file line number Diff line number Diff line change
Expand Up @@ -1226,6 +1226,13 @@ handle_insert(StringInfo s)
*/
xact_had_exception = true;
exception_command_counter++;

/*
* Clear the local tuple pointer if it was left over from a
* previous operation.
*/
exception_log_ptr[my_exception_log_index].local_tuple = NULL;

log_insert_exception(true, "Spock can't find relation", NULL,
NULL, NULL, "INSERT");
end_replication_step();
Expand Down Expand Up @@ -3259,6 +3266,12 @@ static void
execute_sql_command_error_cb(void *arg)
{
errcontext("during execution of queued SQL statement: %s", (char *) arg);
/*
* The errcontext above already includes the SQL statement, so clear
* debug_query_string to prevent it from appearing a second time in
* the LOG output.
*/
debug_query_string = NULL;
}

/*
Expand Down Expand Up @@ -3779,6 +3792,13 @@ spock_apply_main(Datum main_arg)
Assert(MySpockWorker->worker_type == SPOCK_WORKER_APPLY);
MyApplyWorker = &MySpockWorker->worker.apply;

/*
* The apply worker is not a regular backend and has no client query
* string. Initialize debug_query_string to NULL so that LOG reports
* do not print arbitrary memory contents.
*/
debug_query_string = NULL;

/* Setup synchronous commit according to the user's wishes */
SetConfigOption("synchronous_commit",
spock_synchronous_commit ? "local" : "off",
Expand Down
8 changes: 5 additions & 3 deletions src/spock_conflict.c
Original file line number Diff line number Diff line change
Expand Up @@ -431,9 +431,11 @@ spock_report_conflict(SpockConflictType conflict_type,
strlcpy(local_tup_ts_str,
timestamptz_to_str(local_tuple_commit_ts),
MAXDATELEN);
if (local_tuple_origin != InvalidRepOriginId)
if (local_tuple_origin == InvalidRepOriginId)
strlcpy(local_origin_str, "local", sizeof(local_origin_str)); /* locally written */
else
snprintf(local_origin_str, sizeof(local_origin_str), "%u",
(unsigned int) local_tuple_origin);
(unsigned int) local_tuple_origin);
}

initStringInfo(&remotetup);
Expand Down Expand Up @@ -623,7 +625,7 @@ spock_conflict_log_table(SpockConflictType conflict_type,
/* conflict_resolution */
values[6] = CStringGetTextDatum(conflict_resolution_to_string(resolution));
/* local_origin */
if (found_local_origin && local_tuple_origin != InvalidRepOriginId)
if (found_local_origin)
values[7] = Int32GetDatum((int) local_tuple_origin);
else
nulls[7] = true;
Expand Down
1 change: 1 addition & 0 deletions tests/tap/schedule
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,5 @@ test: 013_exception_handling
test: 015_skip_lsn
test: 015_forward_origin_advance
test: 016_crash_recovery_progress
test: 016_sub_disable_missing_relation
test: 017_zodan_3n_timeout
2 changes: 2 additions & 0 deletions tests/tap/t/013_origin_change_restore.pl
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,8 @@ BEGIN
$log_content = get_log_content_since($n2_logfile, $log_pos);
like($log_content, qr/CONFLICT:.*remote update_origin_differs on relation public\.test_origin/,
'since_sub_creation: conflict logged for post-subscription data');
like($log_content, qr/origin=local/,
'since_sub_creation: locally-modified row shows origin=local in log');

# =============================================================================
# Test 3: mode 'none' — suppresses all origin-change logging
Expand Down
231 changes: 231 additions & 0 deletions tests/tap/t/016_sub_disable_missing_relation.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
use strict;
use warnings;
use Test::More;
use lib '.';
use SpockTest qw(
create_cluster destroy_cluster
system_or_bail system_maybe command_ok
get_test_config scalar_query psql_or_bail
wait_for_sub_status wait_for_exception_log wait_for_pg_ready
);

# =============================================================================
# Test 016: SUB_DISABLE on missing relation + skip_lsn recovery
# + regression: dangling local_tuple after conflict
# =============================================================================
#
# Phase 1: Table dropped on n2 → INSERT on n1 → "Spock can't find relation"
# → SUB_DISABLE → skip_lsn recovery.
#
# Phase 2: INSERT conflict sets exception_log->local_tuple in MessageContext.
# After commit MessageContext is reset (local_tuple dangling).
# Table then dropped on n2. Next INSERT triggers missing-relation
# exception; without the fix handle_insert() would dereference the
# dangling pointer → SIGSEGV. Verifies graceful handling.

# ---------------------------------------------------------------------------
# Cluster setup
# ---------------------------------------------------------------------------

create_cluster(2, 'Create 2-node cluster for sub_disable missing relation test');

my $config = get_test_config();
my $node_ports = $config->{node_ports};
my $host = $config->{host};
my $dbname = $config->{db_name};
my $db_user = $config->{db_user};
my $db_password = $config->{db_password};
my $pg_bin = $config->{pg_bin};

my $p1 = $node_ports->[0]; # n1 — provider
my $p2 = $node_ports->[1]; # n2 — subscriber

my $conn_n1 = "host=$host dbname=$dbname port=$p1 user=$db_user password=$db_password";

# PG log file for n2 (set by SpockTest create_postgresql_conf).
my $pg_log_n2 = "../logs/00${p2}.log";

# ===========================================================================
# Phase 1: Basic SUB_DISABLE + skip_lsn recovery
# ===========================================================================

# Create the test table on both nodes before subscribing (no DDL replication).
psql_or_bail(1, "CREATE TABLE test_missing_rel (id SERIAL PRIMARY KEY, val TEXT)");
psql_or_bail(2, "CREATE TABLE test_missing_rel (id SERIAL PRIMARY KEY, val TEXT)");

# Create one-way subscription n1 → n2.
# synchronize_structure/data both false — table already exists on both sides.
psql_or_bail(2,
"SELECT spock.sub_create('sub_n1_n2', '$conn_n1', " .
"ARRAY['default', 'default_insert_only', 'ddl_sql'], false, false)");

ok(wait_for_sub_status(2, 'sub_n1_n2', 'replicating', 30),
'P1: sub_n1_n2 reaches replicating state');

# Baseline: verify replication is working before we break it.
psql_or_bail(1, "INSERT INTO test_missing_rel (val) VALUES ('baseline_row')");
sleep(3);

my $n2_baseline = scalar_query(2, "SELECT count(*) FROM test_missing_rel");
is($n2_baseline, '1', 'P1: baseline row replicates from n1 to n2');

# Drop the table on n2 only — simulates the missing-relation scenario.
psql_or_bail(2, "DROP TABLE test_missing_rel");

# Trigger the exception: insert on n1 into the table that n2 no longer has.
psql_or_bail(1, "INSERT INTO test_missing_rel (val) VALUES ('trigger_exception_1')");
psql_or_bail(1, "INSERT INTO test_missing_rel (val) VALUES ('trigger_exception_2')");

# Wait for the subscription to auto-disable.
ok(wait_for_sub_status(2, 'sub_n1_n2', 'disabled', 30),
'P1: sub_n1_n2 becomes disabled after "Spock can\'t find relation" exception');

ok(wait_for_exception_log(2, "error_message LIKE '%can''t find relation%'", 10),
'P1: exception_log has "Spock can\'t find relation" entry');

my $disable_cnt = scalar_query(2,
"SELECT count(*) FROM spock.exception_log WHERE operation = 'SUB_DISABLE'");
cmp_ok($disable_cnt, '>=', '1', 'P1: exception_log has SUB_DISABLE entry');

my $skip_lsn = scalar_query(2,
"SELECT regexp_replace(error_message, " .
"'.* skip_lsn = ([0-9A-Fa-f/]+).*', '\\1') " .
"FROM spock.exception_log " .
"WHERE operation = 'SUB_DISABLE' " .
"ORDER BY retry_errored_at DESC LIMIT 1");
isnt($skip_lsn, '', 'P1: skip_lsn extracted from SUB_DISABLE exception_log entry');
diag("Phase 1 skip_lsn = $skip_lsn");

# Recovery: recreate the table on n2, set skip_lsn, re-enable.
psql_or_bail(2, "CREATE TABLE test_missing_rel (id SERIAL PRIMARY KEY, val TEXT)");
psql_or_bail(2, "SELECT spock.sub_alter_skiplsn('sub_n1_n2', '$skip_lsn')");
psql_or_bail(2, "SELECT spock.sub_enable('sub_n1_n2')");
pass('P1: skip_lsn set and subscription re-enabled');

ok(wait_for_sub_status(2, 'sub_n1_n2', 'replicating', 30),
'P1: sub_n1_n2 returns to replicating state after recovery');

psql_or_bail(1, "INSERT INTO test_missing_rel (val) VALUES ('post_recovery_row')");
sleep(5);
my $post_count = scalar_query(2, "SELECT count(*) FROM test_missing_rel");
cmp_ok($post_count, '>=', '1', 'P1: post-recovery INSERT replicates to n2');

sleep(3);
my $worker_alive = scalar_query(2,
"SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE 'spock apply%'");
cmp_ok($worker_alive, '>=', '1', 'P1: apply worker still running after re-enable');

my $still_enabled = scalar_query(2,
"SELECT sub_enabled FROM spock.subscription WHERE sub_name = 'sub_n1_n2'");
is($still_enabled, 't', 'P1: subscription remains enabled (basic survival check)');

# ===========================================================================
# Phase 2: Regression — dangling local_tuple after conflict + missing relation
# ===========================================================================

diag("Phase 2: regression test for dangling local_tuple after conflict");

# Record log byte offset so we search only new entries from this point on.
my $log_offset_p2 = -s $pg_log_n2 // 0;

# Create crash-test table on BOTH nodes — N2 gets a conflict row first so
# the first replication cycle triggers conflict detection (setting local_tuple).
psql_or_bail(1,
"SET spock.enable_ddl_replication = off; " .
"CREATE TABLE test_crash_target (id INT PRIMARY KEY, val TEXT); " .
"SELECT spock.repset_add_table('default', 'test_crash_target')");

psql_or_bail(2, "CREATE TABLE test_crash_target (id INT PRIMARY KEY, val TEXT)");

# Plant a conflicting row on N2.
psql_or_bail(2, "INSERT INTO test_crash_target VALUES (1, 'n2_conflict_row')");

# Insert the SAME primary key on N1 — triggers conflict resolution on N2,
# setting exception_log->local_tuple in MessageContext.
psql_or_bail(1, "INSERT INTO test_crash_target VALUES (1, 'n1_row')");

# Wait for T1 (conflict transaction) to be applied.
# After commit, MemoryContextReset(MessageContext) frees local_tuple;
# the pointer in shared memory is now dangling.
my $t1_applied = 0;
for (1..30) {
sleep(1);
my $v = scalar_query(2, "SELECT val FROM test_crash_target WHERE id = 1");
if (defined $v && $v ne '') {
$t1_applied = 1;
diag("P2: T1 conflict applied; exception_log->local_tuple is now dangling");
last;
}
}
ok($t1_applied, 'P2: conflict transaction T1 applied (local_tuple set then freed)');

# Drop the table on N2 only — missing-relation trigger for T2.
psql_or_bail(2, "DROP TABLE test_crash_target");
sleep(1);

# Insert T2 on N1: without the fix, "can't find relation" on N2 second pass
# would dereference the dangling local_tuple → SIGSEGV.
psql_or_bail(1, "INSERT INTO test_crash_target VALUES (2, 'crash_trigger_row')");
pass('P2: T2 insert sent (missing-relation trigger)');

# Wait for graceful exception handling (SUB_DISABLE).
# A SIGSEGV would kill the postmaster; clean handling just disables the sub.
my $graceful = 0;
for (1..20) {
sleep(1);
my $st = scalar_query(2,
"SELECT status FROM spock.sub_show_status() " .
"WHERE subscription_name = 'sub_n1_n2'");
if (defined $st && ($st eq 'disabled' || $st eq 'replicating')) {
$graceful = 1;
diag("P2: exception handled gracefully — subscription status: $st");
last;
}
}

# Search only new log entries (since Phase 2 started) for signal 11.
my $new_log = '';
if (open(my $lf, '<', $pg_log_n2)) {
seek($lf, $log_offset_p2, 0);
local $/;
$new_log = <$lf> // '';
close($lf);
}
my $sigsegv_in_p2 = ($new_log =~ /terminated by signal 11/) ? 1 : 0;
my $crash_recovery_in_p2 =
($new_log =~ /all server processes terminated.*reinitializing/s ||
$new_log =~ /terminating any other active server processes/) ? 1 : 0;

ok($graceful && !$sigsegv_in_p2,
'P2: dangling local_tuple handled gracefully — no SIGSEGV');

if ($sigsegv_in_p2) {
diag("REGRESSION: signal 11 detected in Phase 2 — fix in handle_insert() may be missing");
} elsif ($graceful) {
diag("Fix confirmed: apply worker handled missing-relation without crashing");
} else {
diag("WARNING: subscription did not reach expected state within timeout");
}

# ---------------------------------------------------------------------------
# Cleanup after Phase 2
# ---------------------------------------------------------------------------

# If crash recovery fired (regression), wait for pg to come back.
if ($crash_recovery_in_p2) {
diag("Crash recovery detected — waiting for n2 to recover");
ok(wait_for_pg_ready($host, $p2, $pg_bin, 30),
'n2 postmaster accepts connections after crash recovery');
}

# Disable and drop the subscription to clean up.
system_maybe("$pg_bin/psql", '-h', $host, '-p', $p2, '-U', $db_user, '-d', $dbname,
'-c', "SELECT spock.sub_disable('sub_n1_n2')");
sleep(3);

system_maybe("$pg_bin/psql", '-h', $host, '-p', $p2, '-U', $db_user, '-d', $dbname,
'-c', "SELECT spock.sub_drop('sub_n1_n2')");

destroy_cluster('Destroy cluster after sub_disable crash reproduction test');

done_testing();
Loading
Loading