🐛 fix: review corrections — LMDB reopen 500, indexing & TUI hardening#120
Merged
Conversation
…ist (M6) C1 (CRITICAL): perform_incremental_refresh_with_stores hardcoded ModelType::default() for embedding while force_reindex wrote the --model override only into metadata.json. Choosing a non-384d model (bge-base/ bge-large/mxbai-large) recorded the new dimension in metadata but still produced 384d vectors -> dimension mismatch / corrupt index. Fix: resolve the embedding model from the short name recorded in metadata.json, fail fast on an unknown model or a dims/model mismatch, and thread the resolved ModelType into the embedding closure instead of the hardcoded default. M6: replace the two hand-maintained "valid models" lists (CLI add --model and serve POST /repos), which both omitted bge-large, with a single ModelType::valid_short_names() derived from all(). Tests: short_name round-trips through parse() for every model; valid_short_names() lists every model incl. bge-large. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er path index_single_file (reached from the file-watcher loop and git branch-change refresh) still hardcoded ModelType::default() for embedding, re-introducing the exact dimension-mismatch corruption C1 set out to fix: a repo indexed with a non-default model would re-embed changed files at 384d on the next edit. Extract resolve_embed_model(db_path) -> (ModelType, usize) as the single source for model resolution (reads metadata, fail-fast on unknown model / dims mismatch) and use it in BOTH perform_incremental_refresh_with_stores and index_single_file. Removes the duplicated inline resolution block and the redundant closure rebind. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tant-time auth (M2) M1: the generated post-checkout hook embedded $(pwd) directly into the JSON request body. A repo path containing a double quote or backslash broke out of the JSON string literal (malformed body / injection). Now JSON-escape the path in pure bash (backslashes then quotes) before embedding — no jq dependency, handles Windows paths. M2: API key was compared with raw string `==`, which short-circuits on the first mismatched byte — a timing side-channel on the network-exposed require_auth_for_network path (whose "localhost-only, timing impractical" justification was actually copied from the admin middleware). Add api_key_matches() using SHA-256 digest + non-short-circuiting byte compare, and a shared request_has_valid_api_key() helper so both middlewares use the constant-time path from one place. Fix the misleading doc comment. Test: api_key_matches covers match/mismatch/length/empty/case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… msys bash
The previous M1 fix used bare-backslash parameter expansion
(${REPO_PATH//\/\\}); empirically this does NOT double backslashes in
Git Bash / msys 5.2 (the surrounding double-quotes halve them and a bare
\ pattern fails to match a literal backslash), so a path with a backslash
still produced invalid JSON.
Use the variable-based idiom instead — quoted variables as the search and
replace operands force LITERAL matching:
BS='\'; DQ='"'
REPO_PATH=${REPO_PATH//"$BS"/"$BS$BS"}
REPO_PATH=${REPO_PATH//"$DQ"/"$BS$DQ"}
Verified with node JSON.parse round-trip for paths containing both " and \
(e.g. /c/Users/a"b\c -> {"path":"/c/Users/a\"b\c"} -> parses back exactly).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…5) + fsync (minor) M5: force_reindex_with_stores step 4 still wrote metadata.json via a plain std::fs::write(to_string_pretty(..)), bypassing the atomic writer. A crash there could truncate metadata.json — the exact failure the atomic RMW was introduced to prevent. Route it through crate::vectordb::merge_metadata_atomic, overlaying the preserved keys + a fresh indexed_at onto any existing content. Minor: atomic_write_json now fsyncs the temp file (File::create + write_all + sync_all) BEFORE the rename, so a power-loss cannot leave a zero-length/garbage file in place of the old metadata. Doc comment updated to match the actual guarantee instead of overstating it. vectordb store tests pass (incl. test_persistence). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…M3) + csharp_error (M4)
M3: the remote TUI's `i` and `d` keys called GET /repos/{alias}/info and
POST /repos/{alias}/doctor, but neither route existed server-side, so they
always failed with "endpoint not available" / HTTP error. Add:
- info_handler (GET /repos/:alias/info): mirrors tui::build_info_overlay,
returns the exact InfoResponse shape (chunks/files/max_chunk_id/
db_size_human/model/dims/lock/index_age). 404 on unknown alias.
- doctor_handler (POST /repos/:alias/doctor): runs diagnose_with_store when
the repo's stores are open (reuses the LMDB handle — avoids double-open),
else diagnose(); returns {"results": [...]} from DoctorReport::render_tui.
Both registered in the same auth-layered Router as reindex; like /status
they're open on localhost and key-protected on network binds.
format_age/dir_size_human in tui.rs made pub(crate) for reuse.
M4: status JSON now includes "csharp_error" so the remote detail panel shows
the real C# error instead of the literal "Unknown error".
cargo check + clippy clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rt classify + tests Jupyter (src/chunker/jupyter.rs): - Normalize RawCell.line_count to >=1 in extract_cell so the two passes in merge_adjacent_cells (line numbering vs merge accumulation) agree; empty cells previously stored 0 which the passes counted differently. - Add a loud module-header CAVEAT that chunk start/end lines are synthetic cell-relative positions, NOT real .ipynb JSON offsets — future jump-to-line / re-extraction features must not trust them. Note kernel language is not read (generic code labels). Dart (src/chunker/extractor.rs): - Remove the dead/misleading `mixin_application` branch from classify(). Empirically verified: Dart mixin members parent to `class_body` (already Method); `mixin_application` is the `with A, B` clause and never parents a function_declaration, so the branch never fired. Tests (src/chunker/semantic.rs): - test_dart_semantic_chunking: top-level fn → Function, class & mixin members → Method (regression guard for the misclassification). - test_dart_unparseable_still_chunks: malformed .dart still yields fallback chunks (grammar-failure resilience). 64 chunker tests pass; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DRY (src/serve/tui_common.rs, tui.rs, tui_remote.rs): - Extract shared format_uptime_secs(u64) — format_uptime delegates; remote's format_uptime_from_secs removed. - Move byte-identical restore_terminal into tui_common (pub); both TUIs call it. - render_centered_modal now delegates to the _with_border_color variant (Color::Cyan), removing ~45 duplicated lines. Hygiene: - Fix stale comments: tui.rs "'s' pressed" -> "'l' pressed"; tui_remote module doc "Actions (i/d/f/s)" -> accurate i=info/d=doctor/n=reindex/r=remove/l=reload. - Remove dead `let _ = tx;` suppression in tui_remote. - Align C# sentinel: tui.rs map_repo_rows emits "none" (was "") to match status_handler; shared consumer handles it via default branch (no display change). - Reuse a single reqwest::Client across the remote TUI poll/actions instead of constructing one per request. - search/mod.rs: warn! on query-side model parse-failure/override fallback (previously a silent unwrap_or_default) — keeps the default, just makes a model mismatch visible in logs. Test (src/serve/mod.rs): info_doctor_routes_registered asserts GET /info and POST /doctor on an unknown alias return 404 (mirrors reindex route test). cargo check + clippy clean; 29 serve tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…regression) The Stage 1 fail-fast resolved the embedding model at the top of perform_incremental_refresh_with_stores, which made an up-to-date (no-op) refresh error out on any index whose metadata model can't be parsed — breaking test_incremental_refresh_up_to_date_is_noop and, more importantly, failing no-op refreshes that never embed anything. Move the strict resolve_embed_model() call inside the `if !changed_files` block so it runs only when there are files to embed; keep a lenient model_name/dimensions read at the top for the FileMetaStore. Embedding still fails fast with the same guarantee; no-op refreshes no longer require a resolvable model. Full lib+bins suite green (482 passed; the unrelated Windows relocation test is pre-existing flakiness — passes in isolation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…auth notes Holistic-review findings on the combined branch: IMPORTANT — partial-deletion window: in perform_incremental_refresh_with_stores the lazy resolve_embed_model ran AFTER stale-chunk deletions were committed, so on a corrupt index (unknown model / dims mismatch) it could delete chunks then error before re-embedding, leaving the index with data removed. Move the fail-fast resolve to right after the no-op early-return — before any destructive store mutation — so a corrupt index errors with the index still intact. embed_model is still consumed in the changed-files block. MINOR — doctor_handler ran synchronous diagnose() (tree walk + LMDB scan) while holding a tokio read guard on an async worker. Wrap it in spawn_blocking using blocking_read on the cloned Arc<RwLock> (mirrors the reindex path); still reuses the open LMDB handle to avoid double-open. MINOR — document at the route declaration why POST /doctor is intentionally outside require_admin_auth's management set (read-only diagnostics). cargo check + clippy clean; target tests pass (noop refresh, info_doctor routes). The unrelated Windows relocation test remains pre-existing flakiness (passes in isolation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The db_discovery::repos relocation tests flaked under full-suite parallel load (intermittent "Access is denied" on rename, and occasional missing git remote). Root cause: these tests `git init` a directory then rename it; on Windows the indexer/antivirus scans each fresh .git tree and holds handles on it, blocking the rename. When many such tests run concurrently the scanner is overwhelmed and handles linger for >7s — long enough to exhaust the old 10-attempt rename retry. A transient msys fork failure (EAGAIN) spawning git could also silently drop a repo's remote. Fixes: - Serialize the 9 git-spawning / renaming relocation tests behind a shared, poison-tolerant Mutex so only one .git tree is created/renamed at a time — keeping each Defender scan window short. This is the decisive fix. - Harden production git_remote_url(): retry on transient spawn failure (fork exhaustion) instead of treating it as "no remote", which would wrongly strip a repo's git identity and break relocation / cause prune. NotFound (git absent) still returns immediately; an Ok-but-nonzero status is a real answer and is not retried. - init_git_remote test helper: same spawn-retry instead of .expect() panic. - rename_retry: raise budget to 40 attempts with a 250ms-capped backoff. Validation: full lib suite run 6x consecutively under parallel load — 482 passed, 0 failed every time (incl. a 15s heavy-contention run that previously failed). clippy -D warnings clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ions" error
The reported 500 ("an environment is already opened with different options"
opening BAYR.Aprimo) is heed 0.20.5's Error::BadOpenOptions from heed's own
process-global OPENED_ENV registry — NOT codesearch's TrackedEnv guard (which
would say "double-open prevented"). It fires when a path resolves to a
still-live heed env whose recorded map_size differs from the reopen's resolved
size (e.g. after an MDB_MAP_FULL resize).
Root cause: TrackedEnv::drop called unregister() in the drop body, but the
`inner: heed::Env` field is dropped only AFTER the body returns (Rust drop
order). That leaves a window where codesearch's LMDB_REGISTRY slot is free but
heed's env is still alive. A concurrent TrackedEnv::open (idle reaper dropping a
repo while a reindex/query reopens it) passes register() and falls through to
opts.open(), hitting heed's raw error.
Note: the previously applied stop_fsw fix (drop Readonly/Conflicted) does not
cover this incident — the repo was idle-evicted, so stop_fsw returns None before
reaching that branch.
Fix: wrap inner in ManuallyDrop and drop the heed::Env BEFORE unregister(),
enforcing "our slot free => heed's slot free". A concurrent open now either sees
our slot occupied (clear double-open message + retry) or both free (clean
reopen) — never the inconsistent state. Adds a multi-threaded regression guard
that asserts the forbidden heed string never surfaces (non-flaky: only fails on
real regression).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ons" 500 Strengthening the TrackedEnv concurrency test to 8 threads x 4000 iters revealed the drop-order reorder alone does NOT close the window: with DIFFERENT map sizes on the same path, heed still raised "an environment is already opened with different options". Probing with a CONSISTENT size proved the error is specific to size mismatch — heed defers env close, so a reopen can briefly observe the prior live env, and only disagreeing options trigger the error. Root fix: a process-global per-path map_size pin (MAP_SIZE_PINS). The first resolve_map_size() for a path fixes its size for the process lifetime (monotonically non-decreasing, capped at MAX_LMDB_MAP_SIZE_MB); resize_environment raises the pin so post-resize reopens (e.g. after idle eviction) match the still-live env. This makes every open of a path use a consistent size regardless of metadata-persistence state — the user's BAYR metadata.json had no lmdb_map_size_mb, which is exactly how a resized env could be reopened with a mismatched size. The Drop reorder remains as complementary hardening. Tests: - lmdb_registry: concurrent open/drop/reopen with consistent size (8x4000, barrier-synced) asserts the heed string never surfaces — non-flaky guard. - store: pin is stable+monotonic per path (a later smaller persisted size does not shrink the live pin) and capped at MAX. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pressing 'n' (reindex) previously gave no instant confirmation — the status column only flipped to "Indexing…" on the next 500ms redraw, which was easy to miss on a fast reindex. Now a transient footer flash confirms the action immediately and honestly reflects the launch outcome: - Started → "⟳ Reindex started for '<alias>' …" - AlreadyRunning → "⟳ Reindex already running for '<alias>'" - Failed → "✗ Cannot reindex '<alias>' — see logs" spawn_force_reindex now returns a ReindexLaunch enum so the flash maps to the real synchronous launch result (guard hit / unresolved alias / read-only / open error). The flash auto-clears after 4s. The existing pulsing "⟳ idx…" status label is unchanged. Scope: local in-process TUI only (per request); render_footer gains an Option<&str> flash arg, remote TUI passes None (behavior unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Review-driven correctness and hardening pass across the serve hub, indexing, LMDB layer, and TUI. The headline fix resolves the production 500
an environment is already opened with different optionswhen reopening a repo DB (e.g. after idle eviction + force-reindex).Key Changes
LMDB "different options" 500 (root-cause fix)
map_sizepin (vectordb/store.rs): the first resolved map size for a canonical DB path is pinned for the process lifetime (monotonic, capped atMAX_LMDB_MAP_SIZE_MB);resize_environmentraises the pin so post-eviction reopens match the still-live heed env. heed rejects reopens whosemap_sizediffers from a still-live env, and its env close is deferred — so consistent sizing is what actually prevents the error (proven by test: same-size concurrent churn is clean, different-size is not).TrackedEnv::dropreorder (lmdb_registry.rs): drop theheed::Envbefore freeing our own registry slot, enforcing "our slot free ⟹ heed's slot free" as complementary hardening.Serve / indexing
serve/mod.rs: idle-reaper + reopen path hardening;stop_fswdrops Readonly/Conflicted repos so the LMDB env is released before reopen;/info+/doctorroute wiring; partial-deletion window fix.index/manager.rs,search/mod.rs: lazy embed-model resolution (fixes no-op refresh regression).db_discovery/repos.rs: stale-path resilience +git_remote_urlhardening; reconcile on load.Chunker / embed
chunker/jupyter.rs,semantic.rs,extractor.rs: Jupyter line-count + caveat, Dart classification.embed/embedder.rs,cli/mod.rs:--modelindex-corruption fix + model-list dedup.TUI
serve/tui*.rs: DRY cleanup + hygiene; remote/info+/doctorendpoints.Testing
cargo fmt --all -- --checkcargo clippy --lib --all-features -- -D warningscargo test --lib(lmdb_registry race guard 8×4000 + store pin tests + vectordb suite green)Checklist