Skip to content

Populate FTS5 index during sync and backfill existing databases#94

Merged
wesm merged 9 commits intomainfrom
cli-search
Feb 6, 2026
Merged

Populate FTS5 index during sync and backfill existing databases#94
wesm merged 9 commits intomainfrom
cli-search

Conversation

@wesm
Copy link
Owner

@wesm wesm commented Feb 6, 2026

Summary

  • Populate FTS5 full-text search index during sync (UpsertFTS per message) and backfill existing databases on first search/TUI/MCP launch
  • BackfillFTS processes in batches of 5000 by ID range with a live progress bar, avoiding multi-minute blocking on large archives (1.7M+ messages)
  • TUI and MCP run backfill in background goroutine so they launch instantly
  • CLI search shows Searching... immediately and a progress bar during one-time backfill
  • Use MAX(id) B-tree lookups instead of COUNT(*) table scans for instant backfill detection

Details

The messages_fts FTS5 table was created during InitSchema but never populated, causing search and TUI deep search to always return zero results. The FTS JOIN against the empty table matched nothing, with no fallback to LIKE.

Store layer (internal/store/):

  • UpsertFTS() — inserts/replaces a single FTS row (called per message during sync)
  • BackfillFTS(progress) — batched bulk population from existing data with progress callback
  • NeedsFTSBackfill() — fast MAX(id) check to detect empty/incomplete FTS
  • FTS5Available() — accessor for FTS5 availability

Sync (internal/sync/):

  • persistMessage now calls UpsertFTS after storing recipients, keeping FTS current

CLI commands (cmd/msgvault/cmd/):

  • search, tui, mcp call InitSchema + backfill check on startup
  • search blocks with progress bar; tui/mcp backfill in background

Fixes #64. Fixes #29.

Test plan

  • go test -tags fts5 ./internal/store/ -run FTS — UpsertFTS, BackfillFTS, NeedsFTSBackfill tests
  • make test — full suite passes
  • make lint — clean
  • Manual: ./msgvault search "thank" returns results after backfill

🤖 Generated with Claude Code

wesm and others added 9 commits February 6, 2026 09:15
…29)

The messages_fts table was created but never populated, causing CLI search
and TUI deep search to always return zero results. This adds FTS population
at two points: incrementally during sync (UpsertFTS) and as a one-time
backfill for existing databases (BackfillFTS in InitSchema).

Fixes #29.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… test

- Wrap BackfillFTS delete+insert in a transaction so a mid-operation
  failure doesn't leave the FTS table empty
- Return errors from COUNT(*) queries in InitSchema instead of silently
  swallowing them
- Add TestStore_InitSchema_AutoBackfillFTS to cover the auto-backfill
  path (empty FTS + existing messages triggers backfill on InitSchema)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These commands open the database but never called InitSchema, so the
auto-backfill for existing databases with empty FTS tables never ran.
This was the actual cause of "No messages found" — hasFTSTable() returned
true (table exists), but the JOIN against the empty FTS table yielded
zero rows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the auto-backfill out of InitSchema (which should be pure schema
setup) into a NeedsFTSBackfill() check + explicit BackfillFTS() call
in the CLI commands. This lets search/tui/mcp print a status message
("Building search index... indexed N messages.") so users know what's
happening during the one-time backfill on large archives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- BackfillFTS now processes in batches of 5000 by ID range instead of
  one massive query, with an optional progress callback
- TUI and MCP launch the backfill in a background goroutine so they
  start instantly — FTS is only needed for deep search, not aggregates
- CLI search blocks with live progress: "Indexed 50000 / 1700000..."
- NeedsFTSBackfill detects partial backfills (interrupted) via 90%
  threshold so they resume on next run
- Fix CLAUDE.md stale data reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The search command now shows:
- A visual progress bar during one-time FTS index build:
    Building search index (one-time)...
    [==================            ]  60%  1020000 / 1700000
- A "Searching..." indicator while the query executes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The InitSchema and NeedsFTSBackfill COUNT(*) queries on 1.7M rows
take seconds. Move the status message before store.Open so users see
feedback instantly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
COUNT(*) requires a full table scan in SQLite — several seconds on 1.7M
rows. MAX(id) and MIN(id) are instant B-tree lookups. NeedsFTSBackfill
now compares MAX(rowid) between tables, and BackfillFTS uses the ID
range for progress reporting instead of an exact count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Clamp progress done <= total and guard total > 0 to prevent
  strings.Repeat panic when cursor overshoots ID range
- Use GROUP_CONCAT for from_addr in backfill query to match sync path
  (was LIMIT 1, inconsistent with persistMessage which joins all froms)
- Fix stale comments: "total messages" → "ID range", correct threshold
  explanation in NeedsFTSBackfill

Addresses review findings from jobs #4665, #4669, #4671.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wesm
Copy link
Owner Author

wesm commented Feb 6, 2026

I confirmed that msgvault search works great now on the CLI and deep search works well too

@wesm wesm merged commit 788c093 into main Feb 6, 2026
3 checks passed
wesm added a commit to robelkin/msgvault that referenced this pull request Feb 7, 2026
)

## Summary

- Populate FTS5 full-text search index during sync (`UpsertFTS` per
message) and backfill existing databases on first search/TUI/MCP launch
- `BackfillFTS` processes in batches of 5000 by ID range with a live
progress bar, avoiding multi-minute blocking on large archives (1.7M+
messages)
- TUI and MCP run backfill in background goroutine so they launch
instantly
- CLI `search` shows `Searching...` immediately and a progress bar
during one-time backfill
- Use `MAX(id)` B-tree lookups instead of `COUNT(*)` table scans for
instant backfill detection

## Details

The `messages_fts` FTS5 table was created during `InitSchema` but never
populated, causing `search` and TUI deep search to always return zero
results. The FTS JOIN against the empty table matched nothing, with no
fallback to LIKE.

**Store layer** (`internal/store/`):
- `UpsertFTS()` — inserts/replaces a single FTS row (called per message
during sync)
- `BackfillFTS(progress)` — batched bulk population from existing data
with progress callback
- `NeedsFTSBackfill()` — fast `MAX(id)` check to detect empty/incomplete
FTS
- `FTS5Available()` — accessor for FTS5 availability

**Sync** (`internal/sync/`):
- `persistMessage` now calls `UpsertFTS` after storing recipients,
keeping FTS current

**CLI commands** (`cmd/msgvault/cmd/`):
- `search`, `tui`, `mcp` call `InitSchema` + backfill check on startup
- `search` blocks with progress bar; `tui`/`mcp` backfill in background

Fixes wesm#64. Fixes wesm#29.

## Test plan

- [x] `go test -tags fts5 ./internal/store/ -run FTS` — UpsertFTS,
BackfillFTS, NeedsFTSBackfill tests
- [x] `make test` — full suite passes
- [x] `make lint` — clean
- [x] Manual: `./msgvault search "thank"` returns results after backfill

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bare word search/phrase doesn't work for me FTS triggers missing for automatic full-text search

1 participant