Fix forward origin advance for cascade replication#328
Conversation
rasifr
commented
Feb 3, 2026
ae2acce to
ffb3e75
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughAdds optional origin names to the replication protocol and output API, exposes the slot-name generator, advances forwarded origins with caching in the apply path, changes get_node to accept a missing_ok flag, and adds a TAP test for forward-origin advancement. Changes
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/spock_output_plugin.c`:
- Around line 619-629: The current code calls get_node(txn->origin_id) and then
checks for NULL, but get_node never returns NULL and will ERROR on a bad id;
change the logic to avoid calling get_node with arbitrary txn->origin_id: first
verify that txn->origin_id maps to a valid Spock node using a safe lookup (or a
new helper like node_exists_for_reporigin/lookup_node_by_reporigin that returns
NULL on miss), and only call get_node (or retrieve SpockNode->name) when that
verification succeeds; if no mapping is found, set origin_name = NULL and call
data->api->write_origin(ctx->out, txn->origin_id, txn->origin_lsn, origin_name)
so forwarded transactions don’t crash the process.
In `@tests/tap/t/015_forward_origin_advance.pl`:
- Line 25: The test file currently declares an explicit plan with "use
Test::More tests => 22;" but also calls done_testing(), causing a plan vs
done_testing conflict; remove the explicit plan and change the import to simply
"use Test::More;" (or otherwise eliminate the "tests => 22" part) so the file
relies on done_testing() at the end (refer to the "use Test::More tests => 22;"
line and the done_testing() call).
|
|
||
| use strict; | ||
| use warnings; | ||
| use Test::More tests => 22; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's read the actual file to see its current state
head -30 tests/tap/t/015_forward_origin_advance.plRepository: pgEdge/spock
Length of output: 1366
🏁 Script executed:
# Check the file size to understand how much to read
wc -l tests/tap/t/015_forward_origin_advance.plRepository: pgEdge/spock
Length of output: 101
🏁 Script executed:
# Look for all test-related calls
grep -n "ok\|is\|done_testing\|Test::More" tests/tap/t/015_forward_origin_advance.pl | head -50Repository: pgEdge/spock
Length of output: 2101
🌐 Web query:
Test::More explicit plan tests => N with done_testing() conflict
💡 Result:
In Test::More/Test::Builder you may only declare a plan once per test process. If you do:
use Test::More tests => N; # (or: plan tests => N)
...
done_testing(); # <- second plan attempt / possible mismatchyou’ll get a “plan twice”/planning conflict because done_testing() will emit a plan only if one hasn’t already been set, and if you pass a number it will be checked against any existing numbered plan (and can fail if it contradicts). [1][2]
Fix options:
- Use an explicit plan (recommended when you know N): keep
use Test::More tests => N;(orplan tests => N;) and removedone_testing(). [1][2] - Use
done_testingas the plan (when N is unknown or computed late): douse Test::More;(notests => N), run tests, then calldone_testing();(ordone_testing($N)if you want to assert the expected count). [1][2]
Sources: [1] (Test::More docs), [2] (Test::Builder docs).
🏁 Script executed:
# Count all test assertion calls comprehensively
grep -cE "^\s*(ok|is|isnt|like|unlike|cmp_ok|can_ok|isa_ok|new_ok|subtest|pass|fail)" tests/tap/t/015_forward_origin_advance.plRepository: pgEdge/spock
Length of output: 58
🏁 Script executed:
# Get a full list with line numbers of all test assertions
grep -nE "^\s*(ok|is|isnt|like|unlike|cmp_ok|can_ok|isa_ok|new_ok|subtest|pass|fail)" tests/tap/t/015_forward_origin_advance.plRepository: pgEdge/spock
Length of output: 1003
Fix the Test::More plan vs done_testing conflict.
Line 25 sets an explicit plan (22 tests) while Line 184 calls done_testing(). Test::More cannot have both an explicit plan and done_testing() in the same test file—this creates a TAP validation failure. The file contains only 14 actual test assertions, so the plan count is incorrect. Remove the explicit plan to use done_testing() instead.
🛠️ Suggested fix
-use Test::More tests => 22;
+use Test::More;Also applies to: 184-184
🤖 Prompt for AI Agents
In `@tests/tap/t/015_forward_origin_advance.pl` at line 25, The test file
currently declares an explicit plan with "use Test::More tests => 22;" but also
calls done_testing(), causing a plan vs done_testing conflict; remove the
explicit plan and change the import to simply "use Test::More;" (or otherwise
eliminate the "tests => 22" part) so the file relies on done_testing() at the
end (refer to the "use Test::More tests => 22;" line and the done_testing()
call).
ffb3e75 to
b0097ae
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/spock_apply.c`:
- Around line 3945-4028: maybe_advance_forwarded_origin can throw after the main
transaction commits which causes apply_work to catch and replay an
already-committed transaction; wrap the body that does
replorigin_by_name/replorigin_create/replorigin_advance (the parts that call
StartTransactionCommand/CommitTransactionCommand and the calls to
replorigin_create and replorigin_advance) in a PG_TRY/PG_CATCH block so any
error is caught locally, log the error with elog (including context like
MySubscription->name and remote_origin_name), ensure any open transaction is
cleaned up in the PG_CATCH (call AbortCurrentTransaction or appropriate abort
helper if a transaction was started) and restore MemoryContext to
MessageContext, and do not rethrow the error so the apply loop is not disrupted.
🧹 Nitpick comments (1)
src/spock_apply.c (1)
3960-3975: Single-entry cache may thrash with multiple forwarded origins.If node C receives forwarded transactions from multiple origin nodes (e.g., A and D both forwarded through B), each origin switch causes a cache miss with a full transaction + origin lookup. Consider using a small hash table if multi-origin cascade topologies are expected.
This is fine for the initial implementation if the typical topology involves only one forwarded origin.
This test validates that when node B forwards transactions from node A to node C (with forward_origins='all'), node C properly creates and advances the replication origin for the original source (node A). The test: - Sets up a 3-node cascade topology: A -> B -> C - Creates data on node A that flows through B to C - Verifies the forwarded origin exists on C using slot name format - Checks that the origin LSN is properly advanced Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
8b9e4f7 to
b10ff0f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@include/spock_proto_native.h`:
- Around line 41-42: spock_write_origin and spock_read_origin currently
unconditionally write/read the flags byte causing protocol deserialization
errors for protocol 4; update both functions to gate emitting/consuming the
flags byte behind the protocol check by calling spock_get_proto_version() and
only writing/reading the flags when spock_get_proto_version() >= 5 (or
alternatively bump SPOCK_PROTO_MIN_VERSION_NUM if flags are required for all
supported versions), ensuring the flags handling is colocated in the functions
and tested against the protocol version.
🧹 Nitpick comments (3)
include/spock.h (1)
58-60: Parameter names differ between declaration and definition.The declaration uses
provider_name/subscriber_name, but the definition insrc/spock_functions.c(line 2878) usesprovider_node/subscription_name. While this compiles fine in C, it creates unnecessary confusion for anyone reading the header.♻️ Suggested fix: align names with the definition
-extern void gen_slot_name(Name slot_name, char *dbname, - const char *provider_name, - const char *subscriber_name); +extern void gen_slot_name(Name slot_name, char *dbname, + const char *provider_node, + const char *subscription_name);src/spock_functions.c (1)
194-196: Redundantexternforward declaration.This forward declaration at line 194 is redundant since
gen_slot_nameis already declaredexternininclude/spock.h(line 58), which is included at line 101 of this file. Consider removing the local forward declaration to avoid maintaining duplicate signatures.src/spock_apply.c (1)
4001-4013: Consider usingNoLockintable_closeto hold the lock until transaction commit.
table_close(replorigin_rel, RowExclusiveLock)releases the lock immediately. Between the lock release (line 4013) andCommitTransactionCommand()(line 4014), a concurrent session could attempt to create the same origin name. UsingNoLockwould hold the lock until the transaction commits, closing this TOCTOU window.Suggested fix
- table_close(replorigin_rel, RowExclusiveLock); + table_close(replorigin_rel, NoLock);
ibrarahmad
left a comment
There was a problem hiding this comment.
The PR looks ok to me.
1 - I have not done any testing, just reviewed the code.
2 - Please take a look at the comment made by @coderabbitai.
When forwarding changes in cascade replication, include the origin name in ORIGIN messages so downstream subscribers can track the true origin. Changes: - Add origin_name parameter to spock_write_origin() and spock_read_origin() - Use ORIGIN_FLAG_HAS_NAME flag (0x01) to indicate presence of origin_name - Flags byte is always sent/read (backward compatible with protocol v4) - Origin name only included when protocol >= 5 and name is provided - Pass origin_name through handle_origin() to the output plugin Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
b10ff0f to
b6e6ddb
Compare