Add --clean flag to sync-full for fresh re-sync after bulk deletions#123
Add --clean flag to sync-full for fresh re-sync after bulk deletions#123
Conversation
Adds a --clean option that deletes all local data for an account (messages, conversations, labels, sync history) and re-syncs from scratch. Useful for recovering from corrupted state or resetting staged/deleted items. Features: - ResetSourceData() in store with transaction and CASCADE deletes - Confirmation prompt showing what will be deleted (use --yes to skip) - Double Ctrl+C pattern: first saves checkpoint, second force quits - Account isolation tests to verify other accounts are untouched Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major performance improvements to ResetSourceData: 1. Disable foreign key checks during bulk delete (avoids validation overhead) 2. Delete child tables explicitly (bypasses CASCADE trigger overhead) 3. Batch deletes with LIMIT 5000 (keeps transactions small) 4. Add progress callback for real-time status updates Expected speedup: 2.5 hours -> ~2-5 minutes for 125k messages. The CLI now shows per-table progress and message deletion percentage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Local MCP configuration should not be tracked. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
roborev: Combined ReviewVerdict: Changes introduce two High-severity issues and one Medium-severity issue; no Critical findings. High
Medium
Low Synthesized from 4 reviews (agents: codex, gemini | types: review, security) |
Wrap deferred s.db.Exec in anonymous function with explicit blank identifier assignment to satisfy errcheck linter. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
roborev: Combined ReviewOverall: Needs changes — two high-risk data integrity issues, plus test and cleanup gaps. Critical High
Medium
Low
Synthesized from 4 reviews (agents: codex, gemini | types: security, review) |
1. PRAGMA on pooled connection: PRAGMA foreign_keys = OFF was executed on *sql.DB which is a connection pool. The PRAGMA might apply to a different connection than the subsequent deletes, and a connection could return to the pool with FKs disabled. Fix: Use sql.Conn to get a dedicated connection for all operations. 2. Batching logic bug: deleteChildTableBatched repeatedly selected the same first LIMIT N message IDs because messages weren't deleted yet. After the first batch deleted those children, subsequent batches found 0 rows and exited early, leaving orphaned child rows. Fix: Delete by child table rowid with a JOIN to messages, ensuring each batch finds actual existing child rows to delete. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
roborev: Combined ReviewSummary verdict: 1 high, 2 low findings; no critical issues. High
Low
Synthesized from 4 reviews (agents: codex, gemini | types: security, review) |
|
Sounds like a good idea to me. I have deleted over a million emails from my Gmail account but I don't want to delete them from my archive, so I wouldn't use this but I could understand why someone would want to. I actually had planned to add a permadelete option to the CLI and TUI to be able to delete groups of emails from the local archive using the same UX as the remote delete I'll take a close look at this and work on it when I can so bear with me, I have a lot going on right now! |
|
No worries at all, appreciate that this repo took off a little. Great to hear you see the value here! Agree on the vacuum'ing. I'll not play around with all that as sounds like you might fold this into a wider consideration so don't want to get in the way there, but sounds sensible! |
Background
I'm aware that msgvault is primarily an archiving tool, however what I found was that once I was able to use it with an LLM via the MCP, I was able to very efficiently cut out a lot of the newsletter, promotion, and other spam that I've accumulated over the last 20-25 years.
It seemed crazy to me that I would then keep all of that information in my local database even though I was very happy to have deleted it from Gmail. So I added this flag that allows you to re-sync the database for a particular account to get a fresh copy of your account down once you've completed a lot of mass deletions.
It's effectively the same as wiping the database and running
sync-full, but if you have multiple accounts, that's not very viable.Usage
What it does
Other accounts in the database are completely untouched.
Technical details
Performance optimization
Initial implementation used a single transaction with CASCADE deletes, which took ~2.5 hours for 125k messages. The optimized version completes in ~2-5 minutes by:
PRAGMA foreign_keys = OFF)message_bodies,message_raw,message_recipients,message_labels,attachments,reactionsdirectly instead of relying on CASCADE triggersSafety features
--yesflag or interactive confirmationNew store methods
ResetSourceData(sourceID)- simple wrapper for backwards compatibilityResetSourceDataWithProgress(sourceID, callback)- batched delete with progress reportingTests added
TestResetSourceData- verifies basic functionalityTestResetSourceData_IsolatesAccounts- critical test ensuring one account's reset doesn't affect other accounts' messages, conversations, labels, or sync state🤖 Generated with Claude Code