Fix CPU pinning and UX issues in batch deletion#111
Conversation
ExecuteBatch was calling MarkMessageDeletedByGmailID 1000 times in a tight loop after each batch API call. Each call did a full table scan because source_message_id lacked a standalone index (only existed as the second column in a composite UNIQUE(source_id, source_message_id) index, which SQLite can't use for source_message_id-only lookups). Fix: batch the DB writes using IN (...) clauses (1 UPDATE per 500-row chunk instead of 1000 individual UPDATEs) and add a standalone index on source_message_id for efficient lookups. Also add InitSchema call to delete-staged command (the only command that was missing it) so the new index is created automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes: - OnStart now accepts alreadyProcessed so the progress bar immediately shows the resumed position instead of "0/420928" - Add ctx.Done() check to the retry-failed-IDs loop in ExecuteBatch; without it, Ctrl+C caused every remaining retry ID to fail at the rate limiter and log a warning - Add ctx.Done() check to the batch-fallback individual delete loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- MarkMessagesDeletedByGmailIDBatch now continues on chunk errors, falling back to individual updates for failed chunks (preserves best-effort semantics from the old per-message path) - Add TestStore_MarkMessagesDeletedByGmailIDBatch with >500 IDs exercising multi-chunk behavior - Add TestExecutor_ExecuteBatch_MarksDBRows asserting DB rows are actually marked deleted after successful batch API call - Add TestExecutor_ExecuteBatch_CancelDuringRetry and CancelDuringFallback testing mid-loop cancellation checkpoints - Add TestExecutor_OnStartAlreadyProcessed verifying resumed position flows through to progress reporter - Store alreadyProcessed in trackingProgress for test assertions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When resuming with LastProcessedIndex == total and pending retries, OnStart received startIndex (== total) as alreadyProcessed, making the progress bar show 100% while retry work was still running. Now uses succeeded count instead when retries are pending. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The rate calculation used elapsed/processed, but after resume 'processed' includes prior-run work while 'elapsed' only counts time since restart, producing a wildly inflated rate and near-zero ETA. Now tracks resumeOffset and computes rate from work done in the current run only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Security Review: 1 High/Medium Issue FoundClaude's automated security review identified potential security concerns. Please review each finding below. 🚨 SQL injection via string concatenation in batch query (high)Location: The batch update query uses fmt.Sprintf to build a dynamic IN clause with placeholders, but this pattern is vulnerable if chunk length validation fails or if the placeholder construction is compromised. While args are parameterized, the query structure itself is dynamically built. Use a prepared statement or ensure robust validation of chunk size before query construction to prevent potential SQL injection vectors. Powered by Claude 4.5 Sonnet — this is an automated review, false positives are possible. |
roborev: Combined ReviewVerdict: Security review found 1 High and 1 Medium issue; otherwise clean. Critical High
Medium
Low Synthesized from 4 reviews (agents: codex, gemini | types: review, security) |
roborev: Combined ReviewSummary verdict: One medium-severity behavioral regression risk around permanent deletion semantics, plus a low-severity test coverage gap. Medium
Low
Synthesized from 4 reviews (agents: codex, gemini | types: security, review) |
|
The events I see are:
Roborev identified a number of warnings:
Does this mean that Roborev did these reviews and the branch was merged without fixing the identified warnings? I'd just like to know how the project manages code reviews and whether they are considered blocking. |
|
The SQL injection finding was a false positive. The reviews are not blocking because the LLMs make mistakes. We will have to work on adding more validation and internal checking on the reviews to weed out the false positives. |
Summary
ExecuteBatchwas callingMarkMessageDeletedByGmailID1000 times in a tight loop after each batch API call. Each call did a full table scan becausesource_message_idlacked a standalone index (only existed as the second column inUNIQUE(source_id, source_message_id), unusable forsource_message_id-only lookups). Fixed by batching DB writes withIN (...)clauses and adding a standalone index.ExecuteBatchhad noctx.Done()check, so cancellation caused every remaining ID to fail at the rate limiter and log a warning line.OnStartnow acceptsalreadyProcessedso the progress bar immediately shows the checkpointed position instead of0/N.InitSchematodelete-staged: Was the only command missing it, so the new index wouldn't be created for existing databases.Test plan
make test)delete-stagedon a large batch, Ctrl+C, rebuild, restart — verify progress jumps to resumed position and no warning spam on cancel🤖 Generated with Claude Code