Add regression tests for read-only mode (spock.readonly GUC)#407
Add regression tests for read-only mode (spock.readonly GUC)#407mason-sharp merged 3 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughAdded Changes
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/regress/sql/read_only.sql (1)
115-116: Prefer bounded polling over fixed sleep afterpg_reload_conf().Line 115 and Line 134 can be timing-sensitive in slower environments; a fixed 0.5s sleep may cause flaky regressions.
♻️ Proposed reliability refactor
ALTER SYSTEM SET spock.readonly = 'local'; SELECT pg_reload_conf(); -SELECT pg_sleep(0.5); +DO $$ +DECLARE + i int; +BEGIN + FOR i IN 1..50 LOOP + EXIT WHEN current_setting('spock.readonly') = 'local'; + PERFORM pg_sleep(0.1); + END LOOP; +END $$; @@ ALTER SYSTEM RESET spock.readonly; SELECT pg_reload_conf(); -SELECT pg_sleep(0.5); +DO $$ +DECLARE + i int; +BEGIN + FOR i IN 1..50 LOOP + EXIT WHEN current_setting('spock.readonly') = 'off'; + PERFORM pg_sleep(0.1); + END LOOP; +END $$;Also applies to: 133-134
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/read_only.sql` around lines 115 - 116, The test uses a fixed delay (SELECT pg_sleep(0.5);) after calling pg_reload_conf(), which is timing-sensitive; replace that sleep with bounded polling that repeatedly checks a deterministic indicator that the config change took effect (for example query pg_settings or SHOW the specific parameter changed, or call a small validation query) until it reports the expected value or a timeout elapses. Locate the instances of SELECT pg_sleep(0.5); (appearing after pg_reload_conf()) and replace them with a loop that issues the checking query (e.g. SELECT current_setting('your_parameter') or SELECT count(*) FROM pg_stat_activity WHERE ... as appropriate) with short sleeps and a total timeout to avoid infinite waits. Ensure the polling loop fails the test if the timeout is reached so flakes become deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/regress/sql/read_only.sql`:
- Around line 115-116: The test uses a fixed delay (SELECT pg_sleep(0.5);) after
calling pg_reload_conf(), which is timing-sensitive; replace that sleep with
bounded polling that repeatedly checks a deterministic indicator that the config
change took effect (for example query pg_settings or SHOW the specific parameter
changed, or call a small validation query) until it reports the expected value
or a timeout elapses. Locate the instances of SELECT pg_sleep(0.5); (appearing
after pg_reload_conf()) and replace them with a loop that issues the checking
query (e.g. SELECT current_setting('your_parameter') or SELECT count(*) FROM
pg_stat_activity WHERE ... as appropriate) with short sleeps and a total timeout
to avoid infinite waits. Ensure the polling loop fails the test if the timeout
is reached so flakes become deterministic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 91e7cee8-caa0-44c5-9c03-18668f569cc1
⛔ Files ignored due to path filters (1)
tests/regress/expected/read_only.outis excluded by!**/*.out
📒 Files selected for processing (2)
Makefiletests/regress/sql/read_only.sql
61bcdc5 to
39603fc
Compare
Cover all three modes (off, local, all), superuser exemption, backward-compatible 'user' alias, user-imposed transaction_read_only preservation, and ALTER SYSTEM with separate non-superuser connection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
39603fc to
0444961
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/regress/sql/read_only.sql (1)
114-115: Consider potential timing sensitivity.The
pg_sleep(0.5)afterpg_reload_conf()is a reasonable heuristic for waiting for configuration reload, but could be fragile in slow CI environments or under heavy load. This pattern is common in PostgreSQL regression tests, so it's acceptable. If flaky test failures occur, increasing the sleep duration may help.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/read_only.sql` around lines 114 - 115, The test uses a fixed short wait after reloading Postgres config which can be timing-sensitive; update the statements that call pg_reload_conf() and pg_sleep(0.5) to be more robust—either increase the sleep duration (e.g., to 1s or 2s) or replace the single pg_sleep(0.5) with a small retry/wait loop that checks a known config-dependent side effect until it appears, ensuring the test waits deterministically for pg_reload_conf() to take effect.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/regress/sql/read_only.sql`:
- Around line 114-115: The test uses a fixed short wait after reloading Postgres
config which can be timing-sensitive; update the statements that call
pg_reload_conf() and pg_sleep(0.5) to be more robust—either increase the sleep
duration (e.g., to 1s or 2s) or replace the single pg_sleep(0.5) with a small
retry/wait loop that checks a known config-dependent side effect until it
appears, ensuring the test waits deterministically for pg_reload_conf() to take
effect.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f6e52135-d676-43ac-b9d0-1c685ccb4994
⛔ Files ignored due to path filters (1)
tests/regress/expected/read_only.outis excluded by!**/*.out
📒 Files selected for processing (2)
Makefiletests/regress/sql/read_only.sql
🚧 Files skipped from review as they are similar to previous changes (1)
- Makefile
tests/regress/expected/read_only.out
Outdated
| t | ||
| (1 row) | ||
|
|
||
| SELECT pg_sleep(0.5); |
There was a problem hiding this comment.
Why do we need this delay?
There was a problem hiding this comment.
You're right, I removed it.
| @@ -0,0 +1,223 @@ | |||
| -- Tests for read-only mode (spock.readonly GUC) | |||
There was a problem hiding this comment.
It seems basic coverage looks ok. What I missed here:
- DDL versus general UTILITY commands - show what spock limits and what allows execution (vacuum, SET, etc).
- SECURITY DEFINER function - something like pgaudit still writes to the database. We should pin the current behaviour in a test.
- The most important one: what about replication? DML, SQL, internal Spock commands? How does a read-only system react to the attempt to apply - does it crash, flood logs or gently wait?
- Utility commands: verify VACUUM and ANALYZE pass through (native Postgres behavior, no PreventCommandIfReadOnly for maintenance ops) - SECURITY DEFINER/INVOKER: both blocked — XactReadOnly is a transaction-level flag, SECURITY DEFINER only switches effective user for permission checks (also native Postgres behavior) - Replication: apply worker continues in 'local' mode, pauses gracefully in 'all' mode, resumes after reset with no data loss - Auto DDL: blocked DDL not replicated when read-only is active - repair_mode(): superuser can make local-only writes that are not replicated to subscribers - Remove pg_sleep after pg_reload_conf in ALTER SYSTEM tests - Change test table ownership to nonsuper for VACUUM/ANALYZE tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tests/regress/expected/read_only.out
Outdated
| --- | ||
| --- Test: utility commands allowed in 'local' mode (still as nonsuper) | ||
| --- Spock sets XactReadOnly = true and delegates enforcement entirely to | ||
| --- Postgres. These commands are allowed because Postgres does not call |
There was a problem hiding this comment.
It would be better to show off the reasoning why such commands are allowed (doesn't impact the schema and doesn't interfere with logical replication either). There are no problems guarding the instance against other commands, too, if necessary.
tests/regress/expected/read_only.out
Outdated
| --- PreventCommandIfReadOnly() for maintenance operations. | ||
| --- | ||
| VACUUM ro_test; | ||
| ANALYZE ro_test; |
There was a problem hiding this comment.
The VACUUM ANALYZE command looks more compact than two separate calls
| ------- | ||
| 0 | ||
| (1 row) | ||
|
|
There was a problem hiding this comment.
It would be nice to reproduce the case where read-only blocks replication, if such a mode exists, just to see how it actually works and which ERROR messages bubble up.
- ALTER SYSTEM for 'all' mode now runs on subscriber
- Add walsender PID check to prove apply worker survives readonly
(wait, not crash-and-restart)
- Combine VACUUM and ANALYZE into single VACUUM ANALYZE command
- Reword utility command comment to explain reasoning (no schema change,
no WAL that affects replication) rather than implementation details
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for approving @danolivo . I decided to do one (hopefully) last commit to address recent feedback. |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
tests/regress/sql/read_only.sql (4)
319-321: Redundant ALTER SYSTEM RESET.Line 307 already reset
spock.readonlyon the subscriber. This second reset is harmless but unnecessary.♻️ Remove redundant reset
-- Cleanup replicated table and subscriber readonly state -ALTER SYSTEM RESET spock.readonly; -SELECT pg_reload_conf(); \c :provider_dsn🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/read_only.sql` around lines 319 - 321, The ALTER SYSTEM RESET spock.readonly; followed by SELECT pg_reload_conf(); at the end of the script is redundant because spock.readonly was already reset earlier (line ~307); remove this duplicate ALTER SYSTEM RESET spock.readonly; (and the immediate pg_reload_conf() if it's only for that reset) so the script only performs a single reset of spock.readonly and one reload, leaving any remaining cleanup statements intact.
283-284: Hardcoded sleep may be flaky under load.The 2-second delay assumes the apply worker will process
ConfigReloadPendingwithin that window. While reasonable for typical CI, heavily loaded environments may require longer. Consider documenting this assumption or increasing the margin if flakiness is observed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/read_only.sql` around lines 283 - 284, Replace the hardcoded short wait (SELECT pg_sleep(2);) that assumes the apply worker will flip ConfigReloadPending within 2s: either increase the margin (e.g., to 5–10s) or, better, replace the fixed sleep with a polling loop that queries the apply worker/config state until ConfigReloadPending clears (or a test-time configurable timeout is reached); reference pg_sleep and the ConfigReloadPending/apply worker state in your change and make the timeout configurable for CI vs. local runs.
277-278: Non-deterministic row selection without ORDER BY.
LIMIT 1withoutORDER BYmay return different rows across executions if multiple replication connections exist. In a controlled single-subscription test this works, but addingORDER BY pidwould be more robust.♻️ Add ORDER BY for determinism
-SELECT pid AS repl_pid FROM pg_stat_replication LIMIT 1 +SELECT pid AS repl_pid FROM pg_stat_replication ORDER BY pid LIMIT 1 \gset-SELECT :repl_pid = pid AS worker_survived FROM pg_stat_replication LIMIT 1; +SELECT :repl_pid = pid AS worker_survived FROM pg_stat_replication ORDER BY pid LIMIT 1;Also applies to: 298-298
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/read_only.sql` around lines 277 - 278, Replace the non-deterministic SELECT that uses LIMIT 1 without ordering by adding an explicit ORDER BY pid to the query (the statement that sets repl_pid: "SELECT pid AS repl_pid FROM pg_stat_replication LIMIT 1 \gset") so the chosen replication PID is deterministic; make the same change to the other occurrence referenced (the similar SELECT around the other instance).
251-255: Consider using\o /dev/nullor semicolon for clearer intent.The
\gsetmeta-command stores query results into psql variables, but here you're discarding the return value. While this works, it may confuse readers expecting variable usage.♻️ Alternative: explicit output suppression
BEGIN; -SELECT spock.repair_mode(true) \gset +SELECT spock.repair_mode(true); INSERT INTO ro_repl_test VALUES (99, 'repair data'); -SELECT spock.repair_mode(false) \gset +SELECT spock.repair_mode(false); COMMIT;Or if you want to suppress output completely:
\o /dev/null SELECT spock.repair_mode(true); \o🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/read_only.sql` around lines 251 - 255, Replace the use of psql's `\gset` when calling `SELECT spock.repair_mode(true) \gset` and `SELECT spock.repair_mode(false) \gset` with an explicit no-output approach: either terminate the statements with a normal semicolon (`SELECT spock.repair_mode(true);`) if you want to ignore the result, or wrap the call with psql output suppression (`\o /dev/null` before and `\o` after) to make the intent clear and avoid creating unused psql variables; update the two `spock.repair_mode` calls accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/regress/sql/read_only.sql`:
- Around line 319-321: The ALTER SYSTEM RESET spock.readonly; followed by SELECT
pg_reload_conf(); at the end of the script is redundant because spock.readonly
was already reset earlier (line ~307); remove this duplicate ALTER SYSTEM RESET
spock.readonly; (and the immediate pg_reload_conf() if it's only for that reset)
so the script only performs a single reset of spock.readonly and one reload,
leaving any remaining cleanup statements intact.
- Around line 283-284: Replace the hardcoded short wait (SELECT pg_sleep(2);)
that assumes the apply worker will flip ConfigReloadPending within 2s: either
increase the margin (e.g., to 5–10s) or, better, replace the fixed sleep with a
polling loop that queries the apply worker/config state until
ConfigReloadPending clears (or a test-time configurable timeout is reached);
reference pg_sleep and the ConfigReloadPending/apply worker state in your change
and make the timeout configurable for CI vs. local runs.
- Around line 277-278: Replace the non-deterministic SELECT that uses LIMIT 1
without ordering by adding an explicit ORDER BY pid to the query (the statement
that sets repl_pid: "SELECT pid AS repl_pid FROM pg_stat_replication LIMIT 1
\gset") so the chosen replication PID is deterministic; make the same change to
the other occurrence referenced (the similar SELECT around the other instance).
- Around line 251-255: Replace the use of psql's `\gset` when calling `SELECT
spock.repair_mode(true) \gset` and `SELECT spock.repair_mode(false) \gset` with
an explicit no-output approach: either terminate the statements with a normal
semicolon (`SELECT spock.repair_mode(true);`) if you want to ignore the result,
or wrap the call with psql output suppression (`\o /dev/null` before and `\o`
after) to make the intent clear and avoid creating unused psql variables; update
the two `spock.repair_mode` calls accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e6774e0d-cd14-4a48-8a5d-38c40c6ac72c
⛔ Files ignored due to path filters (1)
tests/regress/expected/read_only.outis excluded by!**/*.out
📒 Files selected for processing (1)
tests/regress/sql/read_only.sql
Cover all three modes (off, local, all), superuser exemption, backward-compatible 'user' alias, user-imposed transaction_read_only preservation, and ALTER SYSTEM with separate non-superuser connection.