feat: scheduled database snapshots by pthmas · Pull Request #37 · evstack/atlas

pthmas · 2026-03-19T16:12:13Z

Summary

Adds opt-in daily pg_dump snapshots as a background tokio task, following the same spawn pattern as the indexer and metadata fetcher
Configurable via SNAPSHOT_ENABLED, SNAPSHOT_TIME (HH:MM UTC), SNAPSHOT_RETENTION (max files kept), and SNAPSHOT_DIR
Automatic cleanup of old snapshots beyond the retention count
Docker image updated with postgresql16-client for pg_dump; compose file wired with env vars and volume mount

Closes #23

Summary by CodeRabbit

New Features
- Optional daily PostgreSQL snapshot scheduler with UTC scheduling, retention, and configurable storage when enabled.
Documentation
- New environment variables to configure snapshots (enable, time, retention, container/host paths, UID/GID).
Chores
- Container image and compose defaults updated for snapshot storage and non-root mount permissions; local snapshot dir added to .gitignore.
Tests
- Unit and integration tests for config, scheduler, retention, and a dump/restore round-trip (ignored by default).

coderabbitai · 2026-03-19T16:32:21Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a configurable, time-based PostgreSQL snapshot scheduler and retention cleanup to the Atlas server, with environment and Docker support, unit/integration tests (including an ignored dump/restore integration), and small build/test tooling updates.

Changes

Cohort / File(s)	Summary
Environment & Compose `\.env.example`, `docker-compose.yml`	Added snapshot env vars (`SNAPSHOT_ENABLED`, `SNAPSHOT_TIME`, `SNAPSHOT_RETENTION`, `SNAPSHOT_DIR`, `SNAPSHOT_HOST_DIR`), `UID:GID` user config, and host mount for snapshot storage.
Docker image `backend/Dockerfile`	Install `postgresql16-client`; create `/snapshots` and set `atlas:atlas` owner (UID/GID 1000) before switching to `USER atlas`.
Server config `backend/crates/atlas-server/src/config.rs`	New `pub struct SnapshotConfig` with `from_env(database_url)` parsing/validation and redacted `Debug`; added unit tests and env cleanup in tests.
Snapshot module `backend/crates/atlas-server/src/snapshot.rs`	New `pub async fn run_snapshot_loop` implementing UTC scheduling, backoff retries, `pg_dump` to temp file + atomic rename, and retention cleanup; includes unit tests for scheduling, backoff, and cleanup.
Server integration & PG helpers `backend/crates/atlas-server/src/main.rs`	Load `SnapshotConfig` and conditionally spawn background snapshot task; promoted `PostgresConnectionConfig` and helpers to `pub(crate)` and added `postgres_command_async` with consistent env handling.
Tests & tooling `backend/crates/atlas-server/Cargo.toml`, `backend/crates/atlas-server/tests/integration/common.rs`, `.../tests/integration/main.rs`, `.../tests/integration/snapshots.rs`	Added `tempfile` dev-dependency; `TestEnv` stores/exposes `database_url()`; added ignored integration test `snapshot_dump_and_restore_round_trip` that gates on `pg_dump`/`pg_restore`.
Misc `.gitignore`	Added `snapshots/` to ignore local snapshot artifacts.

Sequence Diagram

sequenceDiagram
    participant Scheduler as Scheduler (tokio sleep)
    participant SnapLoop as run_snapshot_loop
    participant PGDump as pg_dump process
    participant FS as File System
    participant Cleanup as cleanup_old_snapshots
    participant Backoff as Retry Backoff

    loop daily schedule
        Scheduler->>SnapLoop: Wake at scheduled UTC time
        SnapLoop->>FS: Ensure snapshot dir exists
        SnapLoop->>PGDump: Spawn pg_dump -> write to `.tmp`
        PGDump->>FS: Write dump data to temp file
        alt pg_dump succeeds
            FS->>FS: Rename `.tmp` -> `.dump` (atomic)
            SnapLoop->>Cleanup: Prune old dumps (keep retention, ignore `.tmp`)
            Cleanup->>FS: List/sort/remove old files
            SnapLoop->>Backoff: Reset retry state
        else pg_dump fails
            SnapLoop->>FS: Remove `.tmp` if present
            SnapLoop->>Backoff: Wait backoff interval (5s,10s,20s,30s,60s)
            Backoff->>SnapLoop: Retry
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: add atlas-server CLI commands #41 — modifies the same PostgreSQL subprocess helpers and connection-config entities (PostgresConnectionConfig and postgres_command-related logic).

Suggested reviewers

tac0turtle

Poem

🐇 I hop at dawn with quiet cheer,

I call pg_dump and hold it near.
Timed little files in tidy rows,
I trim old crumbs where moonlight glows.
Hooray for backups—soft and clear!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: scheduled database snapshots' accurately describes the main change: implementing opt-in automated daily database snapshots as a new feature.
Linked Issues check	✅ Passed	The pull request implements all core objectives from issue `#23`: creating snapshots via automated daily pg_dump, storing them with retention management, and loading them back to restore database state.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the snapshot feature requirements; refactoring of postgres_connection_config and docker configuration are necessary supporting changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue-23

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

backend/crates/atlas-server/Cargo.toml (1)

48-48: Use workspace-managed versioning for tempfile.

Please avoid pinning tempfile in the crate manifest and reference it via workspace instead, so dependency versions stay centralized.

Suggested change

 [dev-dependencies]
 testcontainers = { workspace = true }
 testcontainers-modules = { workspace = true }
 tokio = { workspace = true }
 tower = { workspace = true, features = ["util"] }
 serde_json = { workspace = true }
 sqlx = { workspace = true }
-tempfile = "3"
+tempfile = { workspace = true }

Based on learnings: Keep all Rust workspace dependency versions in the root Cargo.toml.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/Cargo.toml` at line 48, Replace the hard-pinned
tempfile = "3" entry in this crate's Cargo.toml with a workspace reference by
removing the version and using the workspace-managed dependency (i.e., change
the tempfile entry to reference the workspace dependency without a version
string), so the crate relies on the version declared in the workspace root; look
for the tempfile entry in backend/crates/atlas-server/Cargo.toml and remove the
pinned version to defer to workspace-managed versions.

backend/crates/atlas-server/tests/integration/snapshots.rs (2)

57-57: Fragile database URL substitution.

The replace("/postgres", "/test_restore") pattern assumes the source database is always named postgres and that /postgres doesn't appear elsewhere in the URL (e.g., in a password or hostname). Consider parsing the URL properly or using a more targeted replacement at the path end.

♻️ Suggested fix using URL parsing

-        let restore_url = db_url.replace("/postgres", "/test_restore");
+        let mut parsed = url::Url::parse(db_url).expect("parse DATABASE_URL");
+        parsed.set_path("/test_restore");
+        let restore_url = parsed.to_string();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/tests/integration/snapshots.rs` at line 57, The
db URL substitution using db_url.replace("/postgres", "/test_restore") is
fragile; instead parse db_url (e.g., with the Url type from the url crate or a
PostgreSQL connection string parser) and replace only the path component to
"test_restore" before rebuilding the string so you don't accidentally modify
passwords/hosts; update the code that constructs restore_url (reference variable
restore_url and source db_url) to parse, set url.path_segments_mut() or
equivalent to the single segment "test_restore", and then serialize the URL back
to a string for use in the test.

88-93: Cleanup may be skipped if assertions fail.

If any assertion fails before line 90, the test_restore database won't be dropped, potentially leaving artifacts that could cause subsequent test runs to fail (e.g., CREATE DATABASE test_restore would error). Consider wrapping cleanup in a guard or using a drop-based cleanup pattern.

♻️ Consider a cleanup guard or defer-style pattern

One approach is to use defer semantics via scopeguard or a custom struct that drops the database in Drop. Alternatively, add a DROP DATABASE IF EXISTS test_restore at the start of the test to handle leftover state from previous failed runs:

+        // Clean up any leftover database from previous failed runs
+        let _ = sqlx::query("DROP DATABASE IF EXISTS test_restore")
+            .execute(pool)
+            .await;
+
         // Create a separate database for restore
         sqlx::query("CREATE DATABASE test_restore")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/tests/integration/snapshots.rs` around lines 88 -
93, The test currently drops the test database with sqlx::query("DROP DATABASE
test_restore").execute(pool).await after calling restore_pool.close().await, but
this cleanup is skipped if earlier assertions panic; refactor to guarantee
cleanup by introducing a guard (e.g., a small struct with Drop impl or using
scopeguard) that holds the pool/connection and runs sqlx::query("DROP DATABASE
IF EXISTS test_restore").execute(...) in Drop, or alternatively run
sqlx::query("DROP DATABASE IF EXISTS test_restore").execute(pool).await at the
start of the test to ensure idempotent cleanup; update references to
restore_pool.close().await and the existing sqlx::query(...) invocation to be
performed via that guard so the database is removed even on test failure.

backend/crates/atlas-server/src/snapshot.rs (2)

127-129: Use idiomatic descending sort.

Per coding guidelines, prefer idiomatic Rust patterns. The two-step sort() then reverse() can be combined into a single descending sort.

♻️ Suggested fix

-    files.sort();
-    files.reverse();
+    files.sort_by(|a, b| b.cmp(a));

As per coding guidelines: "Use idiomatic Rust patterns: prefer .min(), .max(), |=, += over manual if/assign statements"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/snapshot.rs` around lines 127 - 129, Replace
the two-step ascending-then-reverse sequence operating on the `files` collection
(calls to `files.sort()` followed by `files.reverse()`) with a single idiomatic
descending sort; update the code to call a single sort variant that orders
descending (e.g., use `sort_by`/`sort_unstable_by` with `b.cmp(a)` or an
appropriate key-based descending comparator) so the collection is sorted
newest-first in one pass.

47-54: Consider adding a timeout to pg_dump execution.

If pg_dump hangs (e.g., database unresponsive, network issues), the snapshot loop will block indefinitely, preventing future snapshots. Consider wrapping the command with tokio::time::timeout.

♻️ Suggested fix

+    let pg_dump_timeout = Duration::from_secs(3600); // 1 hour max
+    let status = tokio::time::timeout(
+        pg_dump_timeout,
+        tokio::process::Command::new("pg_dump")
             .arg("--dbname")
             .arg(&config.database_url)
             .arg("-Fc")
             .arg("-f")
             .arg(&tmp_path)
             .status()
-            .await?;
+    )
+    .await
+    .map_err(|_| anyhow::anyhow!("pg_dump timed out after {} seconds", pg_dump_timeout.as_secs()))??;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/snapshot.rs` around lines 47 - 54, Wrap the
blocking pg_dump invocation with tokio::time::timeout to avoid hanging the
snapshot loop: instead of awaiting .status() directly on
tokio::process::Command, spawn the child with Command::spawn() (reference
Command::spawn and the tmp_path/status variables), then await child.wait()
inside tokio::time::timeout(Duration::from_secs(...)). If the timeout returns
Err(elapsed) call child.kill().await (or child.kill() and then
child.wait().await) and return or log a timeout error; on Ok, inspect the
ExitStatus as before. Make the timeout duration configurable or a constant and
handle both timeout and command errors consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/crates/atlas-server/src/config.rs`:
- Around line 255-262: SnapshotConfig currently derives Debug and may log
sensitive DB credentials via the database_url field; remove #[derive(Debug)] (or
stop deriving Debug) for SnapshotConfig and implement a manual Debug impl for
the SnapshotConfig struct that formats all fields except database_url (or prints
database_url as a redacted placeholder like "<redacted>") so logs cannot leak
secrets; update any uses that relied on the derived Debug (e.g., fmt::Debug
formatting or debug! calls referencing SnapshotConfig) to continue working with
the new manual impl.
- Around line 293-294: The current env::var call that sets let dir =
env::var("SNAPSHOT_DIR").unwrap_or_else(|_| "/snapshots".to_string()) can
produce an empty string; update the code that reads SNAPSHOT_DIR (the let dir
variable in config.rs) to validate that the resulting string is non-empty when
snapshots are enabled: after obtaining the value (either from env::var or the
default), check if dir.trim().is_empty() and return/configuration-error (or fall
back to "/snapshots") so the configuration fails fast instead of allowing an
empty SNAPSHOT_DIR to propagate into the scheduler loop; ensure the check is
applied in the same function that constructs the snapshot configuration so
callers that enable snapshots receive an explicit error or valid path.

In `@docker-compose.yml`:
- Around line 53-54: The snapshot bind mount using the volume entry "-
${SNAPSHOT_HOST_DIR:-./snapshots}:${SNAPSHOT_DIR:-/snapshots}" can make
/snapshots unwritable for the non-root atlas user because host directory
permissions override container ownership; to fix, either switch to a named
Docker volume (remove SNAPSHOT_HOST_DIR bind and use a named volume) or update
deployment docs and startup to ensure SNAPSHOT_HOST_DIR is pre-created with
correct owner/permissions (matching the atlas UID/GID), or add an init step that
chowns the mounted path to the atlas user before snapshot jobs run; reference
the volume line using SNAPSHOT_HOST_DIR and SNAPSHOT_DIR and update
docker-compose and docs accordingly.

---

Nitpick comments:
In `@backend/crates/atlas-server/Cargo.toml`:
- Line 48: Replace the hard-pinned tempfile = "3" entry in this crate's
Cargo.toml with a workspace reference by removing the version and using the
workspace-managed dependency (i.e., change the tempfile entry to reference the
workspace dependency without a version string), so the crate relies on the
version declared in the workspace root; look for the tempfile entry in
backend/crates/atlas-server/Cargo.toml and remove the pinned version to defer to
workspace-managed versions.

In `@backend/crates/atlas-server/src/snapshot.rs`:
- Around line 127-129: Replace the two-step ascending-then-reverse sequence
operating on the `files` collection (calls to `files.sort()` followed by
`files.reverse()`) with a single idiomatic descending sort; update the code to
call a single sort variant that orders descending (e.g., use
`sort_by`/`sort_unstable_by` with `b.cmp(a)` or an appropriate key-based
descending comparator) so the collection is sorted newest-first in one pass.
- Around line 47-54: Wrap the blocking pg_dump invocation with
tokio::time::timeout to avoid hanging the snapshot loop: instead of awaiting
.status() directly on tokio::process::Command, spawn the child with
Command::spawn() (reference Command::spawn and the tmp_path/status variables),
then await child.wait() inside tokio::time::timeout(Duration::from_secs(...)).
If the timeout returns Err(elapsed) call child.kill().await (or child.kill() and
then child.wait().await) and return or log a timeout error; on Ok, inspect the
ExitStatus as before. Make the timeout duration configurable or a constant and
handle both timeout and command errors consistently.

In `@backend/crates/atlas-server/tests/integration/snapshots.rs`:
- Line 57: The db URL substitution using db_url.replace("/postgres",
"/test_restore") is fragile; instead parse db_url (e.g., with the Url type from
the url crate or a PostgreSQL connection string parser) and replace only the
path component to "test_restore" before rebuilding the string so you don't
accidentally modify passwords/hosts; update the code that constructs restore_url
(reference variable restore_url and source db_url) to parse, set
url.path_segments_mut() or equivalent to the single segment "test_restore", and
then serialize the URL back to a string for use in the test.
- Around line 88-93: The test currently drops the test database with
sqlx::query("DROP DATABASE test_restore").execute(pool).await after calling
restore_pool.close().await, but this cleanup is skipped if earlier assertions
panic; refactor to guarantee cleanup by introducing a guard (e.g., a small
struct with Drop impl or using scopeguard) that holds the pool/connection and
runs sqlx::query("DROP DATABASE IF EXISTS test_restore").execute(...) in Drop,
or alternatively run sqlx::query("DROP DATABASE IF EXISTS
test_restore").execute(pool).await at the start of the test to ensure idempotent
cleanup; update references to restore_pool.close().await and the existing
sqlx::query(...) invocation to be performed via that guard so the database is
removed even on test failure.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f400324c-4e91-4be0-b747-26e65caf32eb

📥 Commits

Reviewing files that changed from the base of the PR and between 2fb2426 and 5255005.

📒 Files selected for processing (10)

.env.example
backend/Dockerfile
backend/crates/atlas-server/Cargo.toml
backend/crates/atlas-server/src/config.rs
backend/crates/atlas-server/src/main.rs
backend/crates/atlas-server/src/snapshot.rs
backend/crates/atlas-server/tests/integration/common.rs
backend/crates/atlas-server/tests/integration/main.rs
backend/crates/atlas-server/tests/integration/snapshots.rs
docker-compose.yml

backend/crates/atlas-server/src/config.rs

docker-compose.yml

coderabbitai

🧹 Nitpick comments (1)

backend/crates/atlas-server/src/config.rs (1)
293-295: Consider documenting that SNAPSHOT_TIME is interpreted as UTC.

Based on context from snapshot.rs, the NaiveTime value is interpreted as UTC (via and_utc()). Operators in non-UTC timezones may expect local time. A brief comment here or in .env.example would clarify.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/config.rs` around lines 293 - 295, Document
that SNAPSHOT_TIME is interpreted as UTC: update the comment near the NaiveTime
parsing in config.rs (around the time_str/NaiveTime::parse_from_str code) to
state explicitly that the parsed NaiveTime will be treated as UTC (snapshot.rs
uses .and_utc()), and add a matching note in .env.example or README to warn
operators in non-UTC timezones that times are UTC. Ensure the comment references
SNAPSHOT_TIME, NaiveTime::parse_from_str, and snapshot.rs's .and_utc() behavior
so readers can find the related code paths.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/crates/atlas-server/src/config.rs`:
- Around line 293-295: Document that SNAPSHOT_TIME is interpreted as UTC: update
the comment near the NaiveTime parsing in config.rs (around the
time_str/NaiveTime::parse_from_str code) to state explicitly that the parsed
NaiveTime will be treated as UTC (snapshot.rs uses .and_utc()), and add a
matching note in .env.example or README to warn operators in non-UTC timezones
that times are UTC. Ensure the comment references SNAPSHOT_TIME,
NaiveTime::parse_from_str, and snapshot.rs's .and_utc() behavior so readers can
find the related code paths.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ae8d11ee-2bc2-47c8-9c61-44b90aed9b3f

📥 Commits

Reviewing files that changed from the base of the PR and between 5255005 and 79529b4.

📒 Files selected for processing (3)

.env.example
backend/crates/atlas-server/src/config.rs
docker-compose.yml

✅ Files skipped from review due to trivial changes (1)

.env.example

🚧 Files skipped from review as they are similar to previous changes (1)

docker-compose.yml

Add opt-in daily pg_dump snapshots to protect against data loss. Runs as a background tokio task with configurable time, retention, and output directory. Includes integration test for dump/restore round-trip.

coderabbitai

🧹 Nitpick comments (2)

backend/crates/atlas-server/src/snapshot.rs (2)

127-129: Consider using sort_by for descending sort.

Per coding guidelines for idiomatic Rust, a single sort_by call with reverse comparison is slightly more efficient and clearer than sort() followed by reverse().

♻️ Proposed change

     // Sort descending (newest first) — timestamp in filename gives lexicographic order
-    files.sort();
-    files.reverse();
+    files.sort_by(|a, b| b.cmp(a));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/snapshot.rs` around lines 127 - 129, The code
currently sorts the vector of filenames with files.sort() and then
files.reverse(), which is less idiomatic and slightly less efficient; replace
that pair with a single descending sort call (e.g., use files.sort_by or
files.sort_unstable_by with a reversed comparison like comparing b to a) so the
filenames are sorted newest-first in one step, updating the sorting of the
variable files where files.sort() and files.reverse() are used.

47-64: Consider capturing pg_dump stderr for better error diagnostics.

When pg_dump fails, only the exit status is logged. Capturing stderr would provide more actionable error messages (e.g., connection refused, permission denied, disk full).

♻️ Proposed enhancement to capture stderr

-    let status = tokio::process::Command::new("pg_dump")
+    let output = tokio::process::Command::new("pg_dump")
         .arg("--dbname")
         .arg(&config.database_url)
         .arg("-Fc")
         .arg("-f")
         .arg(&tmp_path)
-        .status()
+        .output()
         .await?;
 
-    if status.success() {
+    if output.status.success() {
         tokio::fs::rename(&tmp_path, &final_path).await?;
         tracing::info!(%filename, "Snapshot complete");
         cleanup_old_snapshots(&config.dir, config.retention).await;
         Ok(())
     } else {
         let _ = tokio::fs::remove_file(&tmp_path).await;
-        bail!("pg_dump failed with status: {status}");
+        let stderr = String::from_utf8_lossy(&output.stderr);
+        bail!("pg_dump failed with status {}: {}", output.status, stderr.trim());
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/snapshot.rs` around lines 47 - 64, The
pg_dump invocation currently only checks exit status, losing stderr details;
change from tokio::process::Command::status() to .output().await to capture
Output { stdout, stderr }, then on success proceed with
tokio::fs::rename(&tmp_path, &final_path) and tracing::info!(%filename,
"Snapshot complete") and cleanup_old_snapshots(&config.dir,
config.retention).await; on failure remove the tmp file as before but include
the captured stderr (convert bytes to string safely) in the error/bail message
(and log it via tracing::error or include in bail!("pg_dump failed: {}: {}",
status, stderr_text)), referencing the existing variables tmp_path, final_path,
status/output, and cleanup_old_snapshots to locate where to modify behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/crates/atlas-server/src/snapshot.rs`:
- Around line 127-129: The code currently sorts the vector of filenames with
files.sort() and then files.reverse(), which is less idiomatic and slightly less
efficient; replace that pair with a single descending sort call (e.g., use
files.sort_by or files.sort_unstable_by with a reversed comparison like
comparing b to a) so the filenames are sorted newest-first in one step, updating
the sorting of the variable files where files.sort() and files.reverse() are
used.
- Around line 47-64: The pg_dump invocation currently only checks exit status,
losing stderr details; change from tokio::process::Command::status() to
.output().await to capture Output { stdout, stderr }, then on success proceed
with tokio::fs::rename(&tmp_path, &final_path) and tracing::info!(%filename,
"Snapshot complete") and cleanup_old_snapshots(&config.dir,
config.retention).await; on failure remove the tmp file as before but include
the captured stderr (convert bytes to string safely) in the error/bail message
(and log it via tracing::error or include in bail!("pg_dump failed: {}: {}",
status, stderr_text)), referencing the existing variables tmp_path, final_path,
status/output, and cleanup_old_snapshots to locate where to modify behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 40200aef-ea90-4e71-aa1d-ba07c7bba016

📥 Commits

Reviewing files that changed from the base of the PR and between dd3c981 and 1aec619.

📒 Files selected for processing (11)

.env.example
.gitignore
backend/Dockerfile
backend/crates/atlas-server/Cargo.toml
backend/crates/atlas-server/src/config.rs
backend/crates/atlas-server/src/main.rs
backend/crates/atlas-server/src/snapshot.rs
backend/crates/atlas-server/tests/integration/common.rs
backend/crates/atlas-server/tests/integration/main.rs
backend/crates/atlas-server/tests/integration/snapshots.rs
docker-compose.yml

✅ Files skipped from review due to trivial changes (5)

.gitignore
backend/crates/atlas-server/tests/integration/main.rs
.env.example
backend/crates/atlas-server/src/main.rs
backend/crates/atlas-server/Cargo.toml

🚧 Files skipped from review as they are similar to previous changes (3)

docker-compose.yml
backend/Dockerfile
backend/crates/atlas-server/tests/integration/snapshots.rs

…I commands

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/crates/atlas-server/src/snapshot.rs (1)

124-126: Consider using sort_by for descending order in a single call.

This is a minor style suggestion. The current approach works correctly.

♻️ Optional refactor

-    // Sort descending (newest first) — timestamp in filename gives lexicographic order
-    files.sort();
-    files.reverse();
+    // Sort descending (newest first) — timestamp in filename gives lexicographic order
+    files.sort_by(|a, b| b.cmp(a));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/snapshot.rs` around lines 124 - 126, Replace
the two-step descending sort (calling files.sort() then files.reverse()) with a
single descending sort call on the files collection; use files.sort_by with a
comparator that compares b to a (or files.sort_unstable_by for a faster unstable
sort) to achieve descending order in one call, updating the snapshot.rs location
where the files variable is sorted.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/crates/atlas-server/src/main.rs`:
- Around line 359-371: The CI formatting failed due to an extra blank line after
the snapshot scheduler block; edit the block that uses snapshot_config and
tokio::spawn with run_with_retry calling snapshot::run_snapshot_loop and remove
the stray blank line (ensure the closing brace of the if block is directly
followed by the next code or EOF), then re-run rustfmt/cargo fmt to verify
formatting passes.

---

Nitpick comments:
In `@backend/crates/atlas-server/src/snapshot.rs`:
- Around line 124-126: Replace the two-step descending sort (calling
files.sort() then files.reverse()) with a single descending sort call on the
files collection; use files.sort_by with a comparator that compares b to a (or
files.sort_unstable_by for a faster unstable sort) to achieve descending order
in one call, updating the snapshot.rs location where the files variable is
sorted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5c114d8d-c004-4b6e-8d3a-4b6b1d4e004b

📥 Commits

Reviewing files that changed from the base of the PR and between 1aec619 and 93dc63c.

📒 Files selected for processing (2)

backend/crates/atlas-server/src/main.rs
backend/crates/atlas-server/src/snapshot.rs

coderabbitai · 2026-03-31T10:00:34Z

backend/crates/atlas-server/src/main.rs

+    // Spawn snapshot scheduler if enabled
+    if snapshot_config.enabled {
+        tracing::info!("Snapshot scheduler enabled");
+        tokio::spawn(async move {
+            if let Err(e) =
+                run_with_retry(|| snapshot::run_snapshot_loop(snapshot_config.clone())).await
+            {
+                tracing::error!("Snapshot scheduler terminated with error: {}", e);
+            }
+        });
+    }
+
+


⚠️ Potential issue | 🟡 Minor

Fix formatting issue flagged by CI pipeline.

The CI pipeline failed due to a formatting check. There appears to be an extra blank line at line 371.

🔧 Remove extra blank line

} } - let app = api::build_router(state, config.cors_origin.clone());

🧰 Tools

🪛 GitHub Actions: CI

[error] 368-368: cargo fmt --all --check failed: formatting diff detected in the file (unexpected whitespace/formatting change around the line containing let app = api::build_router(...)).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/main.rs` around lines 359 - 371, The CI formatting failed due to an extra blank line after the snapshot scheduler block; edit the block that uses snapshot_config and tokio::spawn with run_with_retry calling snapshot::run_snapshot_loop and remove the stray blank line (ensure the closing brace of the if block is directly followed by the next code or EOF), then re-run rustfmt/cargo fmt to verify formatting passes.

pthmas marked this pull request as ready for review March 19, 2026 16:35

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

backend/crates/atlas-server/src/config.rs Outdated Show resolved Hide resolved

backend/crates/atlas-server/src/config.rs Show resolved Hide resolved

docker-compose.yml Show resolved Hide resolved

pthmas self-assigned this Mar 20, 2026

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

pthmas requested a review from tac0turtle March 23, 2026 15:34

pthmas added 5 commits March 31, 2026 11:45

feat: scheduled database snapshots via pg_dump (#23)

f171e2a

Add opt-in daily pg_dump snapshots to protect against data loss. Runs as a background tokio task with configurable time, retention, and output directory. Includes integration test for dump/restore round-trip.

fix snapshot retries and CI formatting

fa1565f

harden snapshot config handling

8417d85

ignore local snapshot artifacts

487d20e

fix: pin atlas user to uid/gid 1000 to match compose default

1aec619

pthmas force-pushed the issue-23 branch from dd3c981 to 1aec619 Compare March 31, 2026 09:47

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

refactor: share postgres_connection_config between snapshot and db CL…

93dc63c

…I commands

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

pthmas added 2 commits March 31, 2026 13:07

Fix snapshot dump portability

302692d

Fix CLI env parsing and restore flow

a076445

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: scheduled database snapshots#37

feat: scheduled database snapshots#37
pthmas wants to merge 8 commits intomainfrom
issue-23

pthmas commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 19, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pthmas commented Mar 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pthmas commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 19, 2026 •

edited

Loading