Add --clean flag to sync-full for fresh re-sync after bulk deletions by robelkin · Pull Request #123 · wesm/msgvault

robelkin · 2026-02-10T06:59:54Z

Background

I'm aware that msgvault is primarily an archiving tool, however what I found was that once I was able to use it with an LLM via the MCP, I was able to very efficiently cut out a lot of the newsletter, promotion, and other spam that I've accumulated over the last 20-25 years.

It seemed crazy to me that I would then keep all of that information in my local database even though I was very happy to have deleted it from Gmail. So I added this flag that allows you to re-sync the database for a particular account to get a fresh copy of your account down once you've completed a lot of mass deletions.

It's effectively the same as wiping the database and running sync-full, but if you have multiple accounts, that's not very viable.

Usage

# Delete local data for one account and re-sync from Gmail
msgvault sync-full you@gmail.com --clean

# Skip confirmation prompt (for scripting)
msgvault sync-full you@gmail.com --clean --yes

What it does

Shows what will be deleted (message/conversation/label counts)
Requires confirmation before proceeding
Deletes all local data for the specified account only
Immediately runs a full sync to re-download from Gmail

Other accounts in the database are completely untouched.

Technical details

Performance optimization

Initial implementation used a single transaction with CASCADE deletes, which took ~2.5 hours for 125k messages. The optimized version completes in ~2-5 minutes by:

Disabling FK checks during bulk delete (PRAGMA foreign_keys = OFF)
Explicit child table deletion - deletes from message_bodies, message_raw, message_recipients, message_labels, attachments, reactions directly instead of relying on CASCADE triggers
Batched deletes - 5000 rows per batch to keep WAL size manageable
Progress reporting - shows current table and deletion percentage

Safety features

Requires --yes flag or interactive confirmation
Only operates on the specified account (isolation verified by tests)
Double Ctrl+C pattern: first saves checkpoint, second force quits

New store methods

ResetSourceData(sourceID) - simple wrapper for backwards compatibility
ResetSourceDataWithProgress(sourceID, callback) - batched delete with progress reporting

Tests added

TestResetSourceData - verifies basic functionality
TestResetSourceData_IsolatesAccounts - critical test ensuring one account's reset doesn't affect other accounts' messages, conversations, labels, or sync state

🤖 Generated with Claude Code

Adds a --clean option that deletes all local data for an account (messages, conversations, labels, sync history) and re-syncs from scratch. Useful for recovering from corrupted state or resetting staged/deleted items. Features: - ResetSourceData() in store with transaction and CASCADE deletes - Confirmation prompt showing what will be deleted (use --yes to skip) - Double Ctrl+C pattern: first saves checkpoint, second force quits - Account isolation tests to verify other accounts are untouched Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Major performance improvements to ResetSourceData: 1. Disable foreign key checks during bulk delete (avoids validation overhead) 2. Delete child tables explicitly (bypasses CASCADE trigger overhead) 3. Batch deletes with LIMIT 5000 (keeps transactions small) 4. Add progress callback for real-time status updates Expected speedup: 2.5 hours -> ~2-5 minutes for 125k messages. The CLI now shows per-table progress and message deletion percentage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Local MCP configuration should not be tracked. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

roborev-ci · 2026-02-10T07:04:46Z

roborev: Combined Review

Verdict: Changes introduce two High-severity issues and one Medium-severity issue; no Critical findings.

High

internal/store/sync.go:425, 428, 584
PRAGMA foreign_keys is connection-local in SQLite, but s.db.Exec uses pooled connections. This risks returning a connection with foreign keys disabled to the pool and makes the intended PRAGMA ineffective for subsequent deletes.
Fix: Acquire a dedicated *sql.Conn with s.db.Conn(ctx) and run PRAGMA + all batched deletes on that connection, ensuring foreign_keys is re-enabled before returning it to the pool.
internal/store/sync.go (function deleteChildTableBatched)
The batching loop never advances past the first LIMIT of message IDs because it repeatedly selects the same first batch and stops when RowsAffected() is 0. This can leave child rows for later messages untouched and creates orphans after parent deletes with foreign keys off.
Fix: Iterate deterministically through message IDs (e.g., ORDER BY id with lastID tracking), or prefetch IDs per batch and delete children for that batch before advancing.

Medium

internal/store/sync.go:409
Long-running reset loop does not accept context.Context, so Ctrl+C cancellation from the CLI does not propagate to DB calls or loop control.
Fix: Update signature to ResetSourceDataWithProgress(ctx context.Context, ...), use ExecContext, and check ctx.Err() inside loops.

Low
None.

Synthesized from 4 reviews (agents: codex, gemini | types: review, security)

Wrap deferred s.db.Exec in anonymous function with explicit blank identifier assignment to satisfy errcheck linter. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

roborev-ci · 2026-02-10T07:11:12Z

roborev: Combined Review

Overall: Needs changes — two high-risk data integrity issues, plus test and cleanup gaps.

Critical
None.

High

internal/store/sync.go:424-427
PRAGMA foreign_keys = OFF is executed on pooled *sql.DB, so it may apply to a different connection than the deletes, and a tainted connection can return to the pool with FKs disabled. This risks inconsistent delete behavior and future integrity issues.
Fix: Use a dedicated connection (s.db.Conn(ctx)) or a single-connection transaction; run PRAGMA + deletes + re-enable on that same connection.
internal/store/sync.go:597-618
deleteChildTableBatched repeatedly selects the same first LIMIT ? message IDs because parents aren’t deleted until later; after the first batch, RowsAffected becomes 0 and the loop exits, leaving orphaned child rows.
Fix: Select only message IDs that still have child rows (join to child table), or batch delete by child table row IDs.

Medium

internal/store/sync_test.go:210
Tests use small message counts (< batchSize) and don’t validate child table cleanup, so the batching bug is unexercised.
Fix: Add a test with >5000 messages (or make batch size configurable for tests) and assert all child rows are removed.

Low

internal/store/sync.go:427-428
Error from re-enabling FKs is ignored (_ = ...), risking silent failure.
Fix: Log the error (or propagate if possible) when re-enabling fails.

Synthesized from 4 reviews (agents: codex, gemini | types: security, review)

1. PRAGMA on pooled connection: PRAGMA foreign_keys = OFF was executed on *sql.DB which is a connection pool. The PRAGMA might apply to a different connection than the subsequent deletes, and a connection could return to the pool with FKs disabled. Fix: Use sql.Conn to get a dedicated connection for all operations. 2. Batching logic bug: deleteChildTableBatched repeatedly selected the same first LIMIT N message IDs because messages weren't deleted yet. After the first batch deleted those children, subsequent batches found 0 rows and exited early, leaving orphaned child rows. Fix: Delete by child table rowid with a JOIN to messages, ensuring each batch finds actual existing child rows to delete. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

roborev-ci · 2026-02-10T07:59:20Z

roborev: Combined Review

Summary verdict: 1 high, 2 low findings; no critical issues.

High

internal/store/sync.go (lines 430-575): PRAGMA foreign_keys = OFF on a pooled connection is only re-enabled on the success path. Any early return leaves FKs disabled when the connection returns to the pool, risking integrity corruption. Add a defer to restore FKs (ideally with context.Background()), optionally keeping an explicit re-enable on the happy path.

Low

cmd/msgvault/cmd/syncfull.go (lines 144-146 in diff): Errors from COUNT(*) Scan calls are ignored; failures show 0 counts and can mislead before a destructive action. Handle and abort on error.
internal/store/sync_test.go: Missing test for error paths in ResetSourceDataWithProgress to ensure FK checks are restored on the same pooled connection after failure. Add a failure-injection test.

Synthesized from 4 reviews (agents: codex, gemini | types: security, review)

wesm · 2026-02-11T14:19:03Z

Sounds like a good idea to me. I have deleted over a million emails from my Gmail account but I don't want to delete them from my archive, so I wouldn't use this but I could understand why someone would want to. I actually had planned to add a permadelete option to the CLI and TUI to be able to delete groups of emails from the local archive using the same UX as the remote delete delete-staged API (maybe this should be called delete-remote for clarity). If you delete locally then we probably also need a msgvault gc or something that will vacuum the sqlite database.

I'll take a close look at this and work on it when I can so bear with me, I have a lot going on right now!

robelkin · 2026-02-12T10:07:41Z

No worries at all, appreciate that this repo took off a little. Great to hear you see the value here! Agree on the vacuum'ing. I'll not play around with all that as sounds like you might fold this into a wider consideration so don't want to get in the way there, but sounds sensible!

robelkin-rational-partners and others added 3 commits February 10, 2026 00:46

Add .mcp.json to gitignore

9e018f9

Local MCP configuration should not be tracked. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix errcheck lint: handle deferred PRAGMA error return

2d822b4

Wrap deferred s.db.Exec in anonymous function with explicit blank identifier assignment to satisfy errcheck linter. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wesm mentioned this pull request Feb 12, 2026

Question: What should the TUI look like after deleting mail using msgvault? #126

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --clean flag to sync-full for fresh re-sync after bulk deletions#123

Add --clean flag to sync-full for fresh re-sync after bulk deletions#123
robelkin wants to merge 5 commits intowesm:mainfrom
robelkin:resync-clean

robelkin commented Feb 10, 2026

Uh oh!

roborev-ci bot commented Feb 10, 2026

Uh oh!

roborev-ci bot commented Feb 10, 2026

Uh oh!

roborev-ci bot commented Feb 10, 2026

Uh oh!

wesm commented Feb 11, 2026

Uh oh!

robelkin commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

robelkin commented Feb 10, 2026

Background

Usage

What it does

Technical details

Performance optimization

Safety features

New store methods

Tests added

Uh oh!

roborev-ci bot commented Feb 10, 2026

roborev: Combined Review

Uh oh!

roborev-ci bot commented Feb 10, 2026

roborev: Combined Review

Uh oh!

roborev-ci bot commented Feb 10, 2026

roborev: Combined Review

Uh oh!

wesm commented Feb 11, 2026

Uh oh!

robelkin commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants