Skip to content

Fix CPU pinning and UX issues in batch deletion#111

Merged
wesm merged 5 commits intomainfrom
deletion-cpu-pinning
Feb 9, 2026
Merged

Fix CPU pinning and UX issues in batch deletion#111
wesm merged 5 commits intomainfrom
deletion-cpu-pinning

Conversation

@wesm
Copy link
Owner

@wesm wesm commented Feb 9, 2026

Summary

  • Fix CPU pinning during batch deletion: ExecuteBatch was calling MarkMessageDeletedByGmailID 1000 times in a tight loop after each batch API call. Each call did a full table scan because source_message_id lacked a standalone index (only existed as the second column in UNIQUE(source_id, source_message_id), unusable for source_message_id-only lookups). Fixed by batching DB writes with IN (...) clauses and adding a standalone index.
  • Fix Ctrl+C spamming hundreds of warnings: The retry-failed-IDs loop and batch-fallback loop in ExecuteBatch had no ctx.Done() check, so cancellation caused every remaining ID to fail at the rate limiter and log a warning line.
  • Show resumed position on restart: OnStart now accepts alreadyProcessed so the progress bar immediately shows the checkpointed position instead of 0/N.
  • Add InitSchema to delete-staged: Was the only command missing it, so the new index wouldn't be created for existing databases.

Test plan

  • All existing tests pass (make test)
  • Run delete-staged on a large batch, Ctrl+C, rebuild, restart — verify progress jumps to resumed position and no warning spam on cancel

🤖 Generated with Claude Code

wesm and others added 5 commits February 8, 2026 19:30
ExecuteBatch was calling MarkMessageDeletedByGmailID 1000 times in a
tight loop after each batch API call. Each call did a full table scan
because source_message_id lacked a standalone index (only existed as
the second column in a composite UNIQUE(source_id, source_message_id)
index, which SQLite can't use for source_message_id-only lookups).

Fix: batch the DB writes using IN (...) clauses (1 UPDATE per 500-row
chunk instead of 1000 individual UPDATEs) and add a standalone index
on source_message_id for efficient lookups.

Also add InitSchema call to delete-staged command (the only command
that was missing it) so the new index is created automatically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes:
- OnStart now accepts alreadyProcessed so the progress bar immediately
  shows the resumed position instead of "0/420928"
- Add ctx.Done() check to the retry-failed-IDs loop in ExecuteBatch;
  without it, Ctrl+C caused every remaining retry ID to fail at the
  rate limiter and log a warning
- Add ctx.Done() check to the batch-fallback individual delete loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- MarkMessagesDeletedByGmailIDBatch now continues on chunk errors,
  falling back to individual updates for failed chunks (preserves
  best-effort semantics from the old per-message path)
- Add TestStore_MarkMessagesDeletedByGmailIDBatch with >500 IDs
  exercising multi-chunk behavior
- Add TestExecutor_ExecuteBatch_MarksDBRows asserting DB rows are
  actually marked deleted after successful batch API call
- Add TestExecutor_ExecuteBatch_CancelDuringRetry and
  CancelDuringFallback testing mid-loop cancellation checkpoints
- Add TestExecutor_OnStartAlreadyProcessed verifying resumed
  position flows through to progress reporter
- Store alreadyProcessed in trackingProgress for test assertions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When resuming with LastProcessedIndex == total and pending retries,
OnStart received startIndex (== total) as alreadyProcessed, making
the progress bar show 100% while retry work was still running. Now
uses succeeded count instead when retries are pending.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The rate calculation used elapsed/processed, but after resume
'processed' includes prior-run work while 'elapsed' only counts
time since restart, producing a wildly inflated rate and near-zero
ETA. Now tracks resumeOffset and computes rate from work done in
the current run only.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Security Review: 1 High/Medium Issue Found

Claude's automated security review identified potential security concerns. Please review each finding below.


🚨 SQL injection via string concatenation in batch query (high)

Location: internal/store/messages.go:499

The batch update query uses fmt.Sprintf to build a dynamic IN clause with placeholders, but this pattern is vulnerable if chunk length validation fails or if the placeholder construction is compromised. While args are parameterized, the query structure itself is dynamically built. Use a prepared statement or ensure robust validation of chunk size before query construction to prevent potential SQL injection vectors.


Powered by Claude 4.5 Sonnet — this is an automated review, false positives are possible.

@wesm wesm merged commit 7bb9e0b into main Feb 9, 2026
4 checks passed
@roborev-ci
Copy link

roborev-ci bot commented Feb 9, 2026

roborev: Combined Review

Verdict: Security review found 1 High and 1 Medium issue; otherwise clean.

Critical
None.

High

  • SQL Injection in Batch Deletion (internal/store/messages.go:472, Finding ID G101). MarkMessagesDeletedByGmailIDBatch builds an IN (%s) query via string formatting through execInChunks, with gmailIDs sourced from a manifest file. Use parameterized queries and generated ? placeholders instead.

Medium

  • Path Traversal on Deletion Checkpoint (internal/deletion/executor.go:330, 381, Finding ID G102). On cancellation, e.saveCheckpoint writes to a user-provided manifestPath without sanitization, allowing traversal and arbitrary file overwrite. Clean/abs the path and constrain it to a safe directory, or write checkpoints to a dedicated safe location.

Low
None.


Synthesized from 4 reviews (agents: codex, gemini | types: review, security)

@roborev-ci
Copy link

roborev-ci bot commented Feb 9, 2026

roborev: Combined Review

Summary verdict: One medium-severity behavioral regression risk around permanent deletion semantics, plus a low-severity test coverage gap.

Medium

  • internal/deletion/executor.go and internal/store/messages.go:471: Batch marking uses MarkMessagesDeletedByGmailIDBatch which only sets deleted_from_source_at, while prior per-message paths (and fallback paths) use MarkMessageDeletedByGmailID(..., permanent=...). This makes “permanent” semantics inconsistent depending on the code path. Suggested fix: add a permanent parameter to the batch function (and propagate it through fallback paths), or make the batch update set the same fields as the per-message path; document if the permanent flag is intentionally deprecated.

Low

  • internal/deletion/executor_test.go:943: TestExecutor_ExecuteBatch_MarksDBRows only asserts deleted_from_source_at and doesn’t verify the “permanent” flag. Suggested fix: extend the test to assert permanent deletion semantics for the batch success path and, ideally, the fallback path to prevent regressions.

Synthesized from 4 reviews (agents: codex, gemini | types: security, review)

@hughdbrown
Copy link
Contributor

The events I see are:

  • one commit 7bb9e0b
  • two roborev reviews with warnings
  • merge of the pull request

Roborev identified a number of warnings:

  • High: SQL Injection in Batch Deletion
  • Medium: Path Traversal on Deletion Checkpoint
  • a medium and a low that I don't quite get.

Does this mean that Roborev did these reviews and the branch was merged without fixing the identified warnings?

I'd just like to know how the project manages code reviews and whether they are considered blocking.

@wesm
Copy link
Owner Author

wesm commented Feb 9, 2026

The SQL injection finding was a false positive. The reviews are not blocking because the LLMs make mistakes. We will have to work on adding more validation and internal checking on the reviews to weed out the false positives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants