Fix CPU pinning and UX issues in batch deletion by wesm · Pull Request #111 · wesm/msgvault

wesm · 2026-02-09T01:50:20Z

Summary

Fix CPU pinning during batch deletion: ExecuteBatch was calling MarkMessageDeletedByGmailID 1000 times in a tight loop after each batch API call. Each call did a full table scan because source_message_id lacked a standalone index (only existed as the second column in UNIQUE(source_id, source_message_id), unusable for source_message_id-only lookups). Fixed by batching DB writes with IN (...) clauses and adding a standalone index.
Fix Ctrl+C spamming hundreds of warnings: The retry-failed-IDs loop and batch-fallback loop in ExecuteBatch had no ctx.Done() check, so cancellation caused every remaining ID to fail at the rate limiter and log a warning line.
Show resumed position on restart: OnStart now accepts alreadyProcessed so the progress bar immediately shows the checkpointed position instead of 0/N.
Add InitSchema to delete-staged: Was the only command missing it, so the new index wouldn't be created for existing databases.

Test plan

All existing tests pass (make test)
Run delete-staged on a large batch, Ctrl+C, rebuild, restart — verify progress jumps to resumed position and no warning spam on cancel

🤖 Generated with Claude Code

ExecuteBatch was calling MarkMessageDeletedByGmailID 1000 times in a tight loop after each batch API call. Each call did a full table scan because source_message_id lacked a standalone index (only existed as the second column in a composite UNIQUE(source_id, source_message_id) index, which SQLite can't use for source_message_id-only lookups). Fix: batch the DB writes using IN (...) clauses (1 UPDATE per 500-row chunk instead of 1000 individual UPDATEs) and add a standalone index on source_message_id for efficient lookups. Also add InitSchema call to delete-staged command (the only command that was missing it) so the new index is created automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three fixes: - OnStart now accepts alreadyProcessed so the progress bar immediately shows the resumed position instead of "0/420928" - Add ctx.Done() check to the retry-failed-IDs loop in ExecuteBatch; without it, Ctrl+C caused every remaining retry ID to fail at the rate limiter and log a warning - Add ctx.Done() check to the batch-fallback individual delete loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- MarkMessagesDeletedByGmailIDBatch now continues on chunk errors, falling back to individual updates for failed chunks (preserves best-effort semantics from the old per-message path) - Add TestStore_MarkMessagesDeletedByGmailIDBatch with >500 IDs exercising multi-chunk behavior - Add TestExecutor_ExecuteBatch_MarksDBRows asserting DB rows are actually marked deleted after successful batch API call - Add TestExecutor_ExecuteBatch_CancelDuringRetry and CancelDuringFallback testing mid-loop cancellation checkpoints - Add TestExecutor_OnStartAlreadyProcessed verifying resumed position flows through to progress reporter - Store alreadyProcessed in trackingProgress for test assertions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When resuming with LastProcessedIndex == total and pending retries, OnStart received startIndex (== total) as alreadyProcessed, making the progress bar show 100% while retry work was still running. Now uses succeeded count instead when retries are pending. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The rate calculation used elapsed/processed, but after resume 'processed' includes prior-run work while 'elapsed' only counts time since restart, producing a wildly inflated rate and near-zero ETA. Now tracks resumeOffset and computes rate from work done in the current run only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-09T02:13:23Z

Security Review: 1 High/Medium Issue Found

Claude's automated security review identified potential security concerns. Please review each finding below.

🚨 SQL injection via string concatenation in batch query (high)

Location: internal/store/messages.go:499

The batch update query uses fmt.Sprintf to build a dynamic IN clause with placeholders, but this pattern is vulnerable if chunk length validation fails or if the placeholder construction is compromised. While args are parameterized, the query structure itself is dynamically built. Use a prepared statement or ensure robust validation of chunk size before query construction to prevent potential SQL injection vectors.

Powered by Claude 4.5 Sonnet — this is an automated review, false positives are possible.

roborev-ci · 2026-02-09T02:30:21Z

roborev: Combined Review

Verdict: Security review found 1 High and 1 Medium issue; otherwise clean.

Critical
None.

High

SQL Injection in Batch Deletion (internal/store/messages.go:472, Finding ID G101). MarkMessagesDeletedByGmailIDBatch builds an IN (%s) query via string formatting through execInChunks, with gmailIDs sourced from a manifest file. Use parameterized queries and generated ? placeholders instead.

Medium

Path Traversal on Deletion Checkpoint (internal/deletion/executor.go:330, 381, Finding ID G102). On cancellation, e.saveCheckpoint writes to a user-provided manifestPath without sanitization, allowing traversal and arbitrary file overwrite. Clean/abs the path and constrain it to a safe directory, or write checkpoints to a dedicated safe location.

Low
None.

Synthesized from 4 reviews (agents: codex, gemini | types: review, security)

roborev-ci · 2026-02-09T02:31:02Z

roborev: Combined Review

Summary verdict: One medium-severity behavioral regression risk around permanent deletion semantics, plus a low-severity test coverage gap.

Medium

internal/deletion/executor.go and internal/store/messages.go:471: Batch marking uses MarkMessagesDeletedByGmailIDBatch which only sets deleted_from_source_at, while prior per-message paths (and fallback paths) use MarkMessageDeletedByGmailID(..., permanent=...). This makes “permanent” semantics inconsistent depending on the code path. Suggested fix: add a permanent parameter to the batch function (and propagate it through fallback paths), or make the batch update set the same fields as the per-message path; document if the permanent flag is intentionally deprecated.

Low

internal/deletion/executor_test.go:943: TestExecutor_ExecuteBatch_MarksDBRows only asserts deleted_from_source_at and doesn’t verify the “permanent” flag. Suggested fix: extend the test to assert permanent deletion semantics for the batch success path and, ideally, the fallback path to prevent regressions.

Synthesized from 4 reviews (agents: codex, gemini | types: security, review)

hughdbrown · 2026-02-09T03:53:32Z

The events I see are:

one commit 7bb9e0b
two roborev reviews with warnings
merge of the pull request

Roborev identified a number of warnings:

High: SQL Injection in Batch Deletion
Medium: Path Traversal on Deletion Checkpoint
a medium and a low that I don't quite get.

Does this mean that Roborev did these reviews and the branch was merged without fixing the identified warnings?

I'd just like to know how the project manages code reviews and whether they are considered blocking.

wesm · 2026-02-09T12:15:22Z

The SQL injection finding was a false positive. The reviews are not blocking because the LLMs make mistakes. We will have to work on adding more validation and internal checking on the reviews to weed out the false positives.

wesm and others added 5 commits February 8, 2026 19:30

wesm merged commit 7bb9e0b into main Feb 9, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CPU pinning and UX issues in batch deletion#111

Fix CPU pinning and UX issues in batch deletion#111
wesm merged 5 commits intomainfrom
deletion-cpu-pinning

wesm commented Feb 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

Uh oh!

roborev-ci bot commented Feb 9, 2026

Uh oh!

roborev-ci bot commented Feb 9, 2026

Uh oh!

hughdbrown commented Feb 9, 2026

Uh oh!

wesm commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wesm commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Feb 9, 2026

Security Review: 1 High/Medium Issue Found

🚨 SQL injection via string concatenation in batch query (high)

Uh oh!

Uh oh!

roborev-ci bot commented Feb 9, 2026

roborev: Combined Review

Uh oh!

roborev-ci bot commented Feb 9, 2026

roborev: Combined Review

Uh oh!

hughdbrown commented Feb 9, 2026

Uh oh!

wesm commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wesm commented Feb 9, 2026 •

edited

Loading