Skip to content

From v5_STABLE part 2#382

Merged
mason-sharp merged 5 commits intomainfrom
from-v5_STABLE-part2
Mar 11, 2026
Merged

From v5_STABLE part 2#382
mason-sharp merged 5 commits intomainfrom
from-v5_STABLE-part2

Conversation

@mason-sharp
Copy link
Copy Markdown
Member

This is part 2 on top of #372.

Making it a draft for now. It currently has part 1 as a base. Once that is merged, this can use main as its target.

@mason-sharp mason-sharp requested a review from rasifr March 10, 2026 01:43
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 434d2715-a9a7-4b08-9b0f-0d1f1d79177c

📥 Commits

Reviewing files that changed from the base of the PR and between 506481f and b220182.

📒 Files selected for processing (1)
  • src/spock_functions.c
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/spock_functions.c

📝 Walkthrough

Walkthrough

Adds a configurable GUC spock.feedback_frequency to control apply-worker feedback frequency, enforces subscriber read-only guards that block resync/write operations (allowing COPY TO, blocking COPY FROM), and adds TAP tests for resync/read-only behavior plus a log-suppression tweak for a performance test.

Changes

Cohort / File(s) Summary
Feedback Frequency Feature
docs/configuring.md, include/spock.h, src/spock.c, src/spock_apply.c
Introduce spock.feedback_frequency (int GUC, default 200). Add global declaration, register the GUC in _PG_init, and replace the hardcoded feedback threshold in maybe_send_feedback() with the new variable.
Read-Only Mode Enhancements
src/spock_functions.c, src/spock_readonly.c
Add check_readonly_for_resync() to block subscriber-side resync/truncate when spock.readonly is enabled. Update utility parsing to treat COPY TO as read-only and COPY FROM as a write (blocked in read-only mode).
Test Coverage & Test Tweaks
tests/tap/t/015_resync_readonly_origin.pl, tests/tap/t/100_progress_period.pl
Add an end-to-end TAP test exercising resync behavior under read-only provider/subscriber states. Adjust a performance test to reduce log noise by setting log_min_messages to warning before test actions.

Poem

🐇 I counted WALs by rhythm and beat,
Feedback tuned so timing's neat,
COPY TO nibble, COPY FROM must wait,
Resync pauses at the subscriber gate,
A tiny hop to keep replicas straight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'From v5_STABLE part 2' is vague and generic, using non-descriptive terms that don't convey meaningful information about the changeset's primary purpose. Use a more descriptive title that summarizes the main changes, such as 'Add configurable feedback_frequency and read-only resync guards' or similar.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description explains this is part 2 of a multi-part change based on PR #372, which provides adequate context despite minimal detail about the actual changes.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch from-v5_STABLE-part2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mason-sharp
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_functions.c`:
- Around line 1004-1019: The subscription-wide synchronize path in
spock_alter_subscription_synchronize currently skips the readonly precheck that
check_readonly_for_resync enforces (used by
spock_alter_subscription_resynchronize_table), so add a call to
check_readonly_for_resync(nspname, relname) (or a variant that checks the whole
subscription if necessary) early in spock_alter_subscription_synchronize before
any sync-state creation or table truncation; ensure you use the same error
semantics and message by invoking the existing check_readonly_for_resync
function (or refactor that function into a subscription-level check and call it
from both spock_alter_subscription_resynchronize_table and
spock_alter_subscription_synchronize) so the read-only guard is applied
consistently.

In `@tests/tap/t/015_resync_readonly_origin.pl`:
- Around line 52-53: Replace the fixed wall-clock sleeps (system_or_bail
'sleep', '5' and the other similar sleep calls) with state-based waits: remove
system_or_bail 'sleep' invocations and instead poll the subscription/table
status using the test-suite's existing sync helpers (e.g.,
wait_for_subscription_sync, wait_for_table_sync) or, if those helpers are
absent, implement a short polling loop that queries the subscription/table state
and exits when the desired state is reached (with a reasonable timeout and
failure message). Do this for the occurrences at the other noted sleep sites so
the test waits on observable cluster readiness rather than fixed delays.
- Around line 91-92: After each backtick psql invocation (e.g. the
$resync_origin_readonly capture) add an explicit exit-status assertion using
is($?, 0, ...) for expected-success calls and isnt($?, 0, ...) for
expected-failure calls, placed immediately after the command capture and before
the like(...) output checks; apply this change consistently to all four resync
invocations in the test so you first assert the shell exit code ($?) matches the
expected success/failure, then verify error or success text with like(...).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7aa9f8b7-14ca-4264-b87e-7d7955a530fd

📥 Commits

Reviewing files that changed from the base of the PR and between 2d2b568 and 55434b3.

📒 Files selected for processing (8)
  • docs/configuring.md
  • include/spock.h
  • src/spock.c
  • src/spock_apply.c
  • src/spock_functions.c
  • src/spock_readonly.c
  • tests/tap/t/015_resync_readonly_origin.pl
  • tests/tap/t/100_progress_period.pl

Base automatically changed from from-v5_STABLE-part1 to main March 10, 2026 15:39
mason-sharp and others added 4 commits March 10, 2026 18:50
Sends feedback to WAL sender every n messages, where n is the
configured value.

(cherry picked from commit 07b840f)
When sub_resync_table() is called with truncate=true while the
subscriber has spock.readonly enabled, the table gets truncated
but COPY FROM fails, causing permanent data loss.

Fix: Check subscriber readonly status before truncating.

Also fix spock.readonly to allow COPY TO (read operation) while
blocking COPY FROM (write operation). This allows resync to work
when the origin is readonly, since syncing only reads from origin.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
(cherry picked from commit c9482a8)
(cherry picked from commit 2582fe3)
@mason-sharp mason-sharp force-pushed the from-v5_STABLE-part2 branch from 55434b3 to 506481f Compare March 11, 2026 01:52
@mason-sharp mason-sharp marked this pull request as ready for review March 11, 2026 02:57
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/tap/t/015_resync_readonly_origin.pl (1)

52-53: ⚠️ Potential issue | 🟠 Major

Replace the wall-clock sleeps with state-based waits.

These checkpoints are waiting on async replication, config reload, and resync completion, none of which are guaranteed to settle in 2/3/5/10 seconds. That makes the TAP flaky on slow CI and unnecessarily slow on fast CI; poll the observable state instead.

Representative pattern
+sub wait_until {
+    my ($desc, $timeout, $predicate) = `@_`;
+    for (1 .. $timeout) {
+        return if $predicate->();
+        sleep 1;
+    }
+    BAIL_OUT("$desc not reached before timeout");
+}
...
-system_or_bail 'sleep', '2';
+wait_until('subscriber readonly applied', 15, sub {
+    scalar_query(2, "SHOW spock.readonly") eq 'all';
+});

Also applies to: 60-61, 83-84, 94-95, 104-104, 129-129, 161-161, 175-176

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/tap/t/015_resync_readonly_origin.pl` around lines 52 - 53, The test
currently uses a hard sleep via system_or_bail 'sleep', '5' to wait for DDL
replication/table sync, which makes the TAP flaky; replace this wall-clock sleep
with a state-based polling wait that repeatedly checks the actual observable
condition (e.g., a helper like wait_for_replication_completion,
wait_for_resync_complete, or a loop calling
check_table_synced/check_config_reload) until it returns success or a timeout
occurs; update the call site that uses system_or_bail 'sleep' to call the
polling helper (or inline a short loop with a small sleep and a deadline) so the
test proceeds as soon as replication/config/resync is done rather than waiting a
fixed 5 seconds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_apply.c`:
- Around line 3941-3945: The timeout check is using last_receive_timestamp
(updated on every PQgetCopyData) which prevents the wal_sender_timeout branch
from firing; change the condition to compare now against last_feedback_timestamp
instead (i.e. use TimestampDifferenceExceeds(*last_feedback_timestamp, now,
wal_sender_timeout / 2) || w_message_count >= spock_feedback_frequency), and
ensure last_feedback_timestamp is updated whenever feedback is actually sent (in
the helper that sends feedback) so the time-based branch reflects time since
last feedback rather than last receive.

---

Duplicate comments:
In `@tests/tap/t/015_resync_readonly_origin.pl`:
- Around line 52-53: The test currently uses a hard sleep via system_or_bail
'sleep', '5' to wait for DDL replication/table sync, which makes the TAP flaky;
replace this wall-clock sleep with a state-based polling wait that repeatedly
checks the actual observable condition (e.g., a helper like
wait_for_replication_completion, wait_for_resync_complete, or a loop calling
check_table_synced/check_config_reload) until it returns success or a timeout
occurs; update the call site that uses system_or_bail 'sleep' to call the
polling helper (or inline a short loop with a small sleep and a deadline) so the
test proceeds as soon as replication/config/resync is done rather than waiting a
fixed 5 seconds.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 77f1b899-9cdf-4911-a610-35f557e6bcea

📥 Commits

Reviewing files that changed from the base of the PR and between 55434b3 and 506481f.

📒 Files selected for processing (8)
  • docs/configuring.md
  • include/spock.h
  • src/spock.c
  • src/spock_apply.c
  • src/spock_functions.c
  • src/spock_readonly.c
  • tests/tap/t/015_resync_readonly_origin.pl
  • tests/tap/t/100_progress_period.pl
🚧 Files skipped from review as they are similar to previous changes (3)
  • docs/configuring.md
  • src/spock_functions.c
  • include/spock.h

Comment on lines +3941 to +3945
* Send feedback if wal_sender_timeout/2 has passed or after
* spock.feedback_frequency 'w' messages.
*/
if (TimestampDifferenceExceeds(*last_receive_timestamp, now, wal_sender_timeout / 2) ||
w_message_count >= 10)
w_message_count >= spock_feedback_frequency)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Track time since the last feedback, not since the last receive.

Lines 2882-2883 refresh last_receive_timestamp immediately after every PQgetCopyData(), and Line 2957 calls this helper afterwards. That makes the timeout branch here effectively dead during normal streaming, so increasing the threshold to spock_feedback_frequency changes the behavior to “count-based feedback plus separate keepalive traffic” rather than the documented “count or wal_sender_timeout / 2” behavior.

Representative fix
-	TimestampTz last_receive_timestamp = GetCurrentTimestamp();
+	TimestampTz last_receive_timestamp = GetCurrentTimestamp();
+	TimestampTz last_feedback_timestamp = last_receive_timestamp;
...
-					maybe_send_feedback(applyconn, last_received,
-										&last_receive_timestamp);
+					maybe_send_feedback(applyconn, last_received,
+										&last_feedback_timestamp);
...
-static void
-maybe_send_feedback(PGconn *applyconn, XLogRecPtr lsn_to_send,
-					TimestampTz *last_receive_timestamp)
+static void
+maybe_send_feedback(PGconn *applyconn, XLogRecPtr lsn_to_send,
+					TimestampTz *last_feedback_timestamp)
 {
 	static int	w_message_count = 0;
 	TimestampTz now = GetCurrentTimestamp();
...
-	if (TimestampDifferenceExceeds(*last_receive_timestamp, now, wal_sender_timeout / 2) ||
+	if (TimestampDifferenceExceeds(*last_feedback_timestamp, now, wal_sender_timeout / 2) ||
 		w_message_count >= spock_feedback_frequency)
 	{
 		send_feedback(applyconn, lsn_to_send, now, true);
-		*last_receive_timestamp = now;
+		*last_feedback_timestamp = now;
 		w_message_count = 0;
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 3941 - 3945, The timeout check is using
last_receive_timestamp (updated on every PQgetCopyData) which prevents the
wal_sender_timeout branch from firing; change the condition to compare now
against last_feedback_timestamp instead (i.e. use
TimestampDifferenceExceeds(*last_feedback_timestamp, now, wal_sender_timeout /
2) || w_message_count >= spock_feedback_frequency), and ensure
last_feedback_timestamp is updated whenever feedback is actually sent (in the
helper that sends feedback) so the time-based branch reflects time since last
feedback rather than last receive.

spock_alter_subscription_synchronize() iterates remote tables and calls
truncate_table() when truncate=true, but unlike spock_resync_table() it
was missing the check_readonly_for_resync() guard introduced in
SPOC-440. A readonly subscriber would have its tables truncated with no
way to refill them.

Add the guard and fix the missing forward declaration for
check_readonly_for_resync() (static function defined after its first
call site).

Co-Authored-By: Asif Rehman <asifr@pgedge.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rasifr rasifr force-pushed the from-v5_STABLE-part2 branch from 506481f to b220182 Compare March 11, 2026 10:53
@mason-sharp mason-sharp merged commit 33a26be into main Mar 11, 2026
10 checks passed
@mason-sharp mason-sharp deleted the from-v5_STABLE-part2 branch March 11, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants