Skip to content

πŸš€ release: v1.0.207#122

Merged
flupkede merged 105 commits into
masterfrom
develop
Jun 12, 2026
Merged

πŸš€ release: v1.0.207#122
flupkede merged 105 commits into
masterfrom
develop

Conversation

@flupkede

Copy link
Copy Markdown
Owner

Promote develop to master for the v1.0.207 release. Includes: Dart support, Jupyter notebooks, global .codesearchignore, --host/#114, embedding-model fix (#118), and the LMDB reopen-500 fix.

flupkede and others added 30 commits May 20, 2026 22:56
fix: TUI indexing status + SCIP LMDB MDB_MAP_FULL fix (v1.0.128)
* Fix formatting of codesearch index command

* Create codeql.yml

* feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON

* fix: CI test resilience + protect-master workflow (#58) (#59)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
* [worker] cleanup: AGENTS.md β€” 73% reduction, removed stale test report and duplicate bug details

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (v1.0.97...v1.0.132)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
* fix(mcp): ignore project/group params in local/stdio mode instead of erroring

When running MCP in local mode (no serve_state), project/group routing is meaningless because only one DB is available.

Log a warning and fall back to local DB instead of returning an error.

* fix(qc): define YELLOW color in bash QC script
…al duplicate (v1.0.137) (#76)

* fix: create DB directory before acquiring writer lock (serve auto-register)

When `serve` is running and `codesearch index` is run for a repo not yet known
to it, auto-register (POST /repos) failed with a misleading "Database is locked
by another process" 500: SharedStores::new() acquired the writer lock before
the .codesearch.db directory existed, so opening .writer.lock failed with
"path not found". This rolled back the repos.json registration and made the CLI
fall back to a local duplicate index instead of delegating to serve.

- acquire_writer_lock / SharedStores::new now create the DB directory first;
  genuine I/O errors surface distinctly instead of as a lock conflict.
- Serve config writes route through ServeState::persist_config() (honors the
  config path override) β€” production behavior unchanged, register/remove path
  now hermetically testable.
- Regression guards exercise the brand-new-repo create/register path with the
  DB directory genuinely absent (verified to fail against the pre-fix code).
- CHANGELOG: 1.0.136.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: never silently create a local duplicate index when serve is busy

The CLI probes serve's /health before delegating `index`/`index add`. Any
health failure β€” including a *timeout* while serve is warming up its repos at
startup β€” was classified as "serve not running", so the CLI silently created a
local index. That local index is a duplicate serve does not manage and can
cause LMDB file-lock conflicts (and the repo never gets registered with serve).

New behavior via probe_serve_health():
- Responsive  -> delegate as before.
- Connection refused / cannot connect -> serve not running; index locally.
  Detected immediately (no timeout elapses, no retries), so the common
  "no serve -> local" path is NOT slowed down.
- Listening but unresponsive (timeout, retried briefly) -> serve is up but
  busy. The CLI now REFUSES to create a local duplicate and tells the user to
  retry shortly or stop serve first. The fallback is never silent anymore.

Delegation errors are now typed (DelegateError: ServeDown / ServeUnresponsive /
Failed) instead of string-matched. Applies to `index` and `index add` (the
index-creating paths); `index rm` is unchanged.

Tests: probe classification guards (responsive -> Up; listening-but-slow ->
Unresponsive). Rolls into the 1.0.137 release together with the writer-lock fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brings the two files that drifted on master during the v1.0.137 release back
to develop: the updated protect-master.yml (allows release/* branches) and the
CHANGELOG [1.0.135] entry. After this, develop and master trees are identical.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…8) (#79)

Two Windows path-handling bugs that caused spurious "Database not found"
errors and local duplicate indexes:

1. register()/register_with_alias() stored the raw canonicalize() result in
   repos.json. On Windows, canonicalize() returns \?\C:\... (extended-length
   UNC prefix). Downstream .join(".codesearch.db") and Path::exists() calls
   then fail inconsistently (\?\C:\foo\.codesearch.db not found even when
   C:\foo\.codesearch.db exists). 7 repos were affected. Fix: strip_unc()
   removes the prefix before storage. Existing repos.json patched in-place.
   Regression test: register_strips_unc_prefix_from_stored_path.

2. 500 "Database not found" from reindex (alias registered but DB gone) was
   treated as a generic failure -> local fallback -> duplicate index. Fix:
   triggers the same auto-register POST /repos path as 404 (DB recreated by
   serve, no local fallback).

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: flupkede <flupkede@users.noreply.github.com>
… across codebase (#82)

ROOT CAUSE OF RECURRING BUG CLASS
Path::canonicalize() on Windows returns \?\C:\... (extended-length UNC
prefix). Any downstream .join(), .exists(), or HashMap key built from that
path behaves inconsistently β€” the sub-path \?\C:\foo\.codesearch.db may
return false from exists() even when C:\foo\.codesearch.db is present.
This class of bug has silently broken registrations multiple times.

FIX
Introduce safe_canonicalize(path: &Path) -> io::Result<PathBuf> and
strip_unc_prefix(path: PathBuf) -> PathBuf in src/cache/file_meta.rs.
These are the ONLY approved way to canonicalize paths in this codebase.
Exported via crate::cache.

CALL SITES UPDATED (all raw .canonicalize() removed)
- src/cache/file_meta.rs      β€” central definition + 5 new regression tests
- src/db_discovery/repos.rs   β€” register, register_with_alias, unregister_path,
                                alias_for_path; local strip_unc() removed
- src/db_discovery/mod.rs     β€” find_best_database, get_db_path_for_cwd
- src/index/mod.rs            β€” find_git_root, get_global_db_path,
                                add_to_index, remove_from_index,
                                try_delegate_reindex_to_serve (x2),
                                try_delegate_rm_to_serve
- src/lmdb_registry.rs        β€” TrackedEnv registry key (eliminates
                                double-open risk when same dir accessed
                                with and without \?\ prefix)
- src/serve/mod.rs            β€” add_repo_handler, run_serve --register path

POLICY DOCUMENTED
AGENTS.md: "⚠️ Canonical Path Policy β€” MANDATORY" section with rule,
code example, and pointer to regression tests.

REGRESSION TESTS (6 new in cache/file_meta.rs + 1 existing in repos.rs)
- strip_unc_prefix_removes_windows_unc
- strip_unc_prefix_is_idempotent_on_{plain_path,unix_path}
- safe_canonicalize_on_existing_dir_returns_plain_path
- safe_canonicalize_on_nonexistent_path_returns_error
- register_strips_unc_prefix_from_stored_path (repos.rs β€” verifies
  fallback path also strips UNC when canonicalize() fails)

407 lib tests pass. clippy -D warnings clean.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…b_path_smart (#84)

The old normalize_path(&p.canonicalize()...) pattern in get_db_path_smart
was missed in the central safe_canonicalize refactor (v1.0.139). It worked
correctly (normalize_path also strips UNC) but was inconsistent with the
policy. Now all .canonicalize() calls outside safe_canonicalize's own
definition are eliminated.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#86)

PROBLEM 1 β€” ServeUnresponsive aborted with error instead of waiting
When serve is warming up (opening LMDB for 15+ repos blocks the tokio
runtime, causing /health to time out), the CLI refused with an error.
The user had to retry manually.

FIX: serve_delegate_with_warmup_wait() wraps both try_delegate_reindex_to_serve
and try_delegate_add_to_serve. On ServeUnresponsive it prints
"⏳ serve is starting up, waiting..." and retries every 8s up to 6 times
(~2 min budget). On success it prints "βœ… serve is ready, delegating...".
Only exhausting the full budget returns an error.

PROBLEM 2 β€” 409 Conflict from POST /repos on "Database not found" path
When a registered repo's DB was missing, the CLI tried POST /repos to
recreate it. Serve correctly returned 409 (alias already registered).
The CLI treated 409 as a failure and fell back to local indexing.

FIX: when auto-add returns 409, retry as POST /repos/{alias}/reindex?force=true.
Force reindex uses allow_create=true and creates the DB via serve without
local fallback.

AGENTS.md: document the root cause (tokio blocking during warmup) as a
remaining work item with diagnosis and fix guidance.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n_blocking (#88)

PROBLEM
codesearch serve became unresponsive during startup warmup: FileWalker::walk,
VectorStore::build_index (HNSW), and fastembed/ONNX embedding (saturates all
cores) ran synchronously on tokio worker threads. This starved the async
runtime, /health timed out (>3s), and `codesearch index` reported "serve did
not respond in time". The server already returns 202 + spawns background
indexing (accept-and-defer); it just couldn't respond while warming.

FIX
Offload the heavy synchronous warmup work to tokio::task::spawn_blocking, so
the async executor stays responsive (answers /health and accepts POST /repos
immediately, runs the job in the background).
- serve/mod.rs warmup_repo: read stats under .read(); build_index via
  spawn_blocking + Arc clone + blocking_write. Build failure only warns.
- manager.rs perform_incremental_refresh_with_stores: walk, read+chunk+embed,
  and build_index all offloaded.
- manager.rs refresh_index_with_stores: walk + both build_index calls offloaded.

LOCK SAFETY (verified by review)
Every async RwLock guard scope CLOSES before the spawn_blocking that calls
.blocking_write() on the same store β€” no lock-over-await deadlock. blocking_write
is only ever called inside spawn_blocking (never on an async worker).

Test: test_incremental_refresh_up_to_date_is_noop exercises the refactored walk
path. 408 lib tests pass, clippy -D warnings clean.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
… commands (#91)

* chore: add /merge and /release Claude Code slash commands

Codify the project release workflow as two committed slash commands under
.claude/commands/ (force-added past .gitignore, like .claude/CLAUDE.md):

- /merge: README/CHANGELOG freshness checks -> commit -> validate -> push ->
  PR to develop -> auto-merge after CI. No tag.
- /release: /merge, then promote develop -> master via a "Release vX.Y.Z" PR
  (protect-master allows develop), then push the vX.Y.Z tag that triggers
  release.yml. Includes optional post-release develop sync.

Commands document the repo's real conventions: feature->develop->master flow,
master branch protection, and the pre-commit version-bump-on-feature-branches
rule that fixes the release version at the feature commit.

Tooling-only change on a chore/ branch: no version bump, no CHANGELOG entry
(CHANGELOG tracks the shipped binary's behavior).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: address review remarks on /merge and /release commands

- /merge: abort unless on feature/*|features/*|fix/* (the only branches the
  pre-commit hook version-bumps) β€” closes the gap where running from a
  non-bumping branch silently broke the version/CHANGELOG premise.
- Clarify CHANGELOG heading version math for multi-commit landings (hook bumps
  +1 per commit; verify heading matches Cargo.toml after the final commit).
- Capture PR numbers explicitly (gh pr view --json number) before merge/poll.
- /release: fetch --tags and guard against a double release (stop if the tag
  already exists locally or on origin).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: document /merge and /release workflow in AGENTS.md

Add a Release workflow section describing the two slash commands, the
branch-protection rule, the tag-triggers-release.yml pipeline, and the
feature-branch-only version-bump rule that fixes the release version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(chunker): semantic Markdown chunking via tree-sitter-md

Markdown and .txt files were indexed as a single whole-file block (the
fallback chunker has no char budget), so a search hit returned an entire
page β€” real Aprimo docs reached 80 KB in one chunk.

Add the tree-sitter-md *block* grammar and chunk Markdown by heading
section instead: each chunk is one heading plus its own prose/code,
excluding nested subsections (which become their own chunks). The
heading path is carried in the breadcrumb context (File > Title >
Subsection) so embeddings capture each section's place in the document.

Also add split_oversized, a char- and line-aware splitter for the
unstructured paths (Markdown + the generic fallback): a single physical
line longer than the char budget is hard-split on UTF-8 boundaries, so
scraped one-line HTML/markdown can no longer produce an enormous chunk.
The structured code path keeps using split_if_needed unchanged, so code
chunking is unaffected.

- Cargo.toml: add tree-sitter-md 0.5.3
- grammar.rs/language.rs: register Markdown as tree-sitter-supported
- semantic.rs: chunk_markdown + emit_md_section + split_oversized
- tests: section split, nested breadcrumbs, oversized + long-line splits

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* [worker] final review: fix chunk_markdown doc comment

Reference the actual splitter used by the markdown path
(split_oversized, char-aware) instead of split_if_needed
(the code path's line-based splitter).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: document semantic Markdown chunking + correct language table

- CHANGELOG: add [1.0.145] entry for tree-sitter-md block-grammar Markdown
  chunking (sections/headings/code fences).
- README: expand the Supported Languages table to all 15 tree-sitter
  languages and bump the "9 languages" count to 15 β€” correcting pre-existing
  drift that omitted Shell, Ruby, PHP, YAML, JSON, and (new) Markdown.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(test): sanitize customer ref in markdown chunking fixtures

The pre-push customer-ref guard flagged "aprimo" in two semantic.rs test
fixtures (a frontmatter URL and a comment). Replaced with generic
example.com / "real-world scraped docs" β€” the test assertions never
reference either, so behavior is unchanged. Realign CHANGELOG heading to
the post-bump version (1.0.146).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* [worker] stage 1/5: capture git remote identity per repo

Add RepoMeta.git_remote (serde default, backward compatible) and a
best-effort git_remote_url() helper. Populate it in register() and
register_with_alias() so every registered repo records its
remote.origin.url for later relocation of moved/renamed folders.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* @
[worker] stage 2/5: relocate moved repos + reconcile pass + index prune

- Best-effort git relocation: try_relocate() walks to nearest existing
  ancestor and bounded-depth scans for a git root with matching
  remote.origin.url; unambiguous single match rewrites repos.json.
- ServeState::reconcile_all_paths() runs at startup before phase 1/2/3;
  relocates or warns+skips missing paths (never crashes).
- Existence guards added to phase-2 SCIP and phase-3 prewarm consumers.
- New `codesearch index prune` command: relocate-first, else unregister
  stale aliases, with summary output.
- CODESEARCH_RELOCATE_MAX_DEPTH env (default 3).
- Unit tests for capture-on-register and try_relocate (renamed leaf,
  path-exists, no-remote, ambiguous).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@

* @
[worker] stage 3/5: remove user-settable --alias, always derive

- Drop `--alias`/`-a` from `index add` subcommand and the legacy
  `index --add` flag path. Alias is always derived from the directory
  name via ReposConfig::register().
- add_to_index() loses its `alias` parameter; legacy current-dir local
  DBs are now auto-registered with a derived alias.
- Serve delegation always sends None so serve derives the alias too.
- Replace test_cli_index_add_accepts_alias_flag with
  test_cli_index_add_rejects_alias_flag + parses_without_alias.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@

* @
[worker] stage 4/5: tolerate hand-edited repos.json via reconcile()

- ReposConfig::reconcile() runs from load_from() on both new and legacy
  parse paths (in-memory only, no disk write):
  1. drop entries with empty/blank alias keys
  2. drop orphan repos_meta entries with no matching repo
  3. prune group members referencing unknown aliases; drop empty groups
- Never renames existing alias keys (would break group refs); a
  non-standard hand-edited alias is tolerated as-is. Never crashes.
- Unit tests for empty-key, group-pruning/empty-group, and orphan-meta.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@

* @
[worker] stage 5/5: docs + tighten reconcile() visibility

- Document stale-path relocation, `index prune`, derived-alias policy,
  and repos.json reconcile() in AGENTS.md and .claude/CLAUDE.md.
- reconcile() is now pub(crate) (only used internally + same-module tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@

* @
[worker] final review: use DB_DIR_NAME constant in relocation scan skip-list

Replace hardcoded ".codesearch.db" literal with crate::constants::DB_DIR_NAME
in is_skippable_scan_dir (no-hardcoded-config-strings rule).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@

* @
[worker] tests: extract testable prune_stale/relocate_missing + expand coverage

Refactor for testability (no behavior change):
- Add pure ReposConfig::relocate_missing() -> (relocated, unresolved) and
  prune_stale() -> (relocated, removed); no disk I/O, no logging.
- prune_index() and ServeState::reconcile_all_paths() now delegate to these,
  removing duplicated relocate-loop logic.

New unit tests (8):
- register_derives_alias_from_directory_name
- try_relocate_finds_renamed_parent (parent-level rename within depth)
- try_relocate_none_beyond_max_depth (depth bound enforced)
- relocate_missing_rewrites_only_moved_repos
- prune_stale_removes_unrelocatable_entries (+ group cleanup)
- prune_stale_relocates_then_keeps_relocatable_entries
- load_from_applies_reconcile_to_hand_edited_file (load-path reconcile)

24 repos lib tests pass; clippy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@

* @
docs: README + CHANGELOG for relocation, index prune, derived alias

- README: document `codesearch index prune`, automatic relocation of
  moved/renamed repos (CODESEARCH_RELOCATE_MAX_DEPTH), the alias-always-
  derived policy (no --alias flag), and hand-edited repos.json tolerance.
- CHANGELOG: consolidated 1.0.149 entry (Added/Changed/Fixed).
- README language table + alias example updates (pre-existing).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@

* @
[worker] address review remarks: align CHANGELOG version + restore log path

- CHANGELOG entry retitled to 1.0.151 to match the shipped Cargo.toml
  version (pre-commit bumps patch by 1 on this commit).
- reconcile warn for unresolved repos again includes the missing path for
  diagnostics (lost during the relocate_missing extraction).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat: auto-prune stale repos during Phase 1 warmup

When a repo's database or path no longer exists (e.g. folder moved),
Phase 1 now automatically unregisters the alias from repos.json instead
of logging a warning and leaving the stale entry forever.

Prune conditions (safe β€” only missing-db / path-gone, not transient errors):
- .codesearch.db directory does not exist at registered path
- Registered path itself no longer exists
- Alias resolves to nothing in config

Side effects per pruned alias:
- stop_fsw + evict from DashMap + remove last_access timer
- unregister_alias (removes from repos, repos_meta, groups)
- persist via config.save()

Closes: stale repos.json entries after folder reorganization

* fix: add missing YELLOW color variable in qc.sh

* bump version to 1.0.153 β€” align with CHANGELOG

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up on PR #42 + #43 audit. Two gaps identified:
- No automated tests for new Warm/Write state semantics, zombie-proof reaper, or /status endpoint
- No HTTP timeouts in standalone TUI reqwest calls

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ELOG (#98)

Squash merge fix/windows-8dot3-path-relocation β†’ develop
* Fix formatting of codesearch index command

* Create codeql.yml

* feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON

* fix: CI test resilience + protect-master workflow (#58) (#59)

* release: v1.0.132 β€” tree-sitter expansion, LMDB stability, CI hardening (#64)

* fix: CI test resilience + protect-master workflow (#58)

* Sync master β†’ develop (tree-sitter) (#60)

* Fix formatting of codesearch index command

* Create codeql.yml

* feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON

* fix: CI test resilience + protect-master workflow (#58) (#59)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (#61)

* [worker] cleanup: AGENTS.md β€” 73% reduction, removed stale test report and duplicate bug details

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (v1.0.97...v1.0.132)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* sync: align develop with master β€” AGENTS.md, Cargo.toml, Cargo.lock (#63)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* Release v1.0.135: MCP local mode fix + QC fix + release branch support (#72)

* Release v1.0.134: MCP local mode fix + QC script fix (closes #65)

* ci: allow release/* branches to target master PRs

* docs: add CHANGELOG entry for v1.0.135 (MCP local mode fix) (#73)

* Release v1.0.137 β€” serve-aware indexing fixes (#77)

* fix: CI test resilience + protect-master workflow (#58)

* Sync master β†’ develop (tree-sitter) (#60)

* Fix formatting of codesearch index command

* Create codeql.yml

* feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON

* fix: CI test resilience + protect-master workflow (#58) (#59)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (#61)

* [worker] cleanup: AGENTS.md β€” 73% reduction, removed stale test report and duplicate bug details

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (v1.0.97...v1.0.132)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* sync: align develop with master β€” AGENTS.md, Cargo.toml, Cargo.lock (#63)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* fix: MCP local mode project/group fallback + QC script fix (#66)

* fix(mcp): ignore project/group params in local/stdio mode instead of erroring

When running MCP in local mode (no serve_state), project/group routing is meaningless because only one DB is available.

Log a warning and fall back to local DB instead of returning an error.

* fix(qc): define YELLOW color in bash QC script

* chore: simplify release workflow β€” feature-only version bump (#74)

* fix: serve-aware indexing β€” create DB dir before lock + no silent local duplicate (v1.0.137) (#76)

* fix: create DB directory before acquiring writer lock (serve auto-register)

When `serve` is running and `codesearch index` is run for a repo not yet known
to it, auto-register (POST /repos) failed with a misleading "Database is locked
by another process" 500: SharedStores::new() acquired the writer lock before
the .codesearch.db directory existed, so opening .writer.lock failed with
"path not found". This rolled back the repos.json registration and made the CLI
fall back to a local duplicate index instead of delegating to serve.

- acquire_writer_lock / SharedStores::new now create the DB directory first;
  genuine I/O errors surface distinctly instead of as a lock conflict.
- Serve config writes route through ServeState::persist_config() (honors the
  config path override) β€” production behavior unchanged, register/remove path
  now hermetically testable.
- Regression guards exercise the brand-new-repo create/register path with the
  DB directory genuinely absent (verified to fail against the pre-fix code).
- CHANGELOG: 1.0.136.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: never silently create a local duplicate index when serve is busy

The CLI probes serve's /health before delegating `index`/`index add`. Any
health failure β€” including a *timeout* while serve is warming up its repos at
startup β€” was classified as "serve not running", so the CLI silently created a
local index. That local index is a duplicate serve does not manage and can
cause LMDB file-lock conflicts (and the repo never gets registered with serve).

New behavior via probe_serve_health():
- Responsive  -> delegate as before.
- Connection refused / cannot connect -> serve not running; index locally.
  Detected immediately (no timeout elapses, no retries), so the common
  "no serve -> local" path is NOT slowed down.
- Listening but unresponsive (timeout, retried briefly) -> serve is up but
  busy. The CLI now REFUSES to create a local duplicate and tells the user to
  retry shortly or stop serve first. The fallback is never silent anymore.

Delegation errors are now typed (DelegateError: ServeDown / ServeUnresponsive /
Failed) instead of string-matched. Applies to `index` and `index add` (the
index-creating paths); `index rm` is unchanged.

Tests: probe classification guards (responsive -> Up; listening-but-slow ->
Unresponsive). Rolls into the 1.0.137 release together with the writer-lock fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Release v1.0.138 β€” strip UNC paths + auto-add on missing DB (#80)

* fix: CI test resilience + protect-master workflow (#58)

* Sync master β†’ develop (tree-sitter) (#60)

* Fix formatting of codesearch index command

* Create codeql.yml

* feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON

* fix: CI test resilience + protect-master workflow (#58) (#59)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (#61)

* [worker] cleanup: AGENTS.md β€” 73% reduction, removed stale test report and duplicate bug details

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (v1.0.97...v1.0.132)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* sync: align develop with master β€” AGENTS.md, Cargo.toml, Cargo.lock (#63)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* fix: MCP local mode project/group fallback + QC script fix (#66)

* fix(mcp): ignore project/group params in local/stdio mode instead of erroring

When running MCP in local mode (no serve_state), project/group routing is meaningless because only one DB is available.

Log a warning and fall back to local DB instead of returning an error.

* fix(qc): define YELLOW color in bash QC script

* chore: simplify release workflow β€” feature-only version bump (#74)

* fix: serve-aware indexing β€” create DB dir before lock + no silent local duplicate (v1.0.137) (#76)

* fix: create DB directory before acquiring writer lock (serve auto-register)

When `serve` is running and `codesearch index` is run for a repo not yet known
to it, auto-register (POST /repos) failed with a misleading "Database is locked
by another process" 500: SharedStores::new() acquired the writer lock before
the .codesearch.db directory existed, so opening .writer.lock failed with
"path not found". This rolled back the repos.json registration and made the CLI
fall back to a local duplicate index instead of delegating to serve.

- acquire_writer_lock / SharedStores::new now create the DB directory first;
  genuine I/O errors surface distinctly instead of as a lock conflict.
- Serve config writes route through ServeState::persist_config() (honors the
  config path override) β€” production behavior unchanged, register/remove path
  now hermetically testable.
- Regression guards exercise the brand-new-repo create/register path with the
  DB directory genuinely absent (verified to fail against the pre-fix code).
- CHANGELOG: 1.0.136.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: never silently create a local duplicate index when serve is busy

The CLI probes serve's /health before delegating `index`/`index add`. Any
health failure β€” including a *timeout* while serve is warming up its repos at
startup β€” was classified as "serve not running", so the CLI silently created a
local index. That local index is a duplicate serve does not manage and can
cause LMDB file-lock conflicts (and the repo never gets registered with serve).

New behavior via probe_serve_health():
- Responsive  -> delegate as before.
- Connection refused / cannot connect -> serve not running; index locally.
  Detected immediately (no timeout elapses, no retries), so the common
  "no serve -> local" path is NOT slowed down.
- Listening but unresponsive (timeout, retried briefly) -> serve is up but
  busy. The CLI now REFUSES to create a local duplicate and tells the user to
  retry shortly or stop serve first. The fallback is never silent anymore.

Delegation errors are now typed (DelegateError: ServeDown / ServeUnresponsive /
Failed) instead of string-matched. Applies to `index` and `index add` (the
index-creating paths); `index rm` is unchanged.

Tests: probe classification guards (responsive -> Up; listening-but-slow ->
Unresponsive). Rolls into the 1.0.137 release together with the writer-lock fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: align develop with master (post-v1.0.137 release) (#78)

Brings the two files that drifted on master during the v1.0.137 release back
to develop: the updated protect-master.yml (allows release/* branches) and the
CHANGELOG [1.0.135] entry. After this, develop and master trees are identical.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: strip UNC prefix in repos.json + auto-add on missing DB (v1.0.138) (#79)

Two Windows path-handling bugs that caused spurious "Database not found"
errors and local duplicate indexes:

1. register()/register_with_alias() stored the raw canonicalize() result in
   repos.json. On Windows, canonicalize() returns \?\C:\... (extended-length
   UNC prefix). Downstream .join(".codesearch.db") and Path::exists() calls
   then fail inconsistently (\?\C:\foo\.codesearch.db not found even when
   C:\foo\.codesearch.db exists). 7 repos were affected. Fix: strip_unc()
   removes the prefix before storage. Existing repos.json patched in-place.
   Regression test: register_strips_unc_prefix_from_stored_path.

2. 500 "Database not found" from reindex (alias registered but DB gone) was
   treated as a generic failure -> local fallback -> duplicate index. Fix:
   triggers the same auto-register POST /repos path as 404 (DB recreated by
   serve, no local fallback).

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Release v1.0.139 β€” central safe_canonicalize() for all path ops (#83)

* fix: CI test resilience + protect-master workflow (#58)

* Sync master β†’ develop (tree-sitter) (#60)

* Fix formatting of codesearch index command

* Create codeql.yml

* feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON

* fix: CI test resilience + protect-master workflow (#58) (#59)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (#61)

* [worker] cleanup: AGENTS.md β€” 73% reduction, removed stale test report and duplicate bug details

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (v1.0.97...v1.0.132)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* sync: align develop with master β€” AGENTS.md, Cargo.toml, Cargo.lock (#63)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* fix: MCP local mode project/group fallback + QC script fix (#66)

* fix(mcp): ignore project/group params in local/stdio mode instead of erroring

When running MCP in local mode (no serve_state), project/group routing is meaningless because only one DB is available.

Log a warning and fall back to local DB instead of returning an error.

* fix(qc): define YELLOW color in bash QC script

* chore: simplify release workflow β€” feature-only version bump (#74)

* fix: serve-aware indexing β€” create DB dir before lock + no silent local duplicate (v1.0.137) (#76)

* fix: create DB directory before acquiring writer lock (serve auto-register)

When `serve` is running and `codesearch index` is run for a repo not yet known
to it, auto-register (POST /repos) failed with a misleading "Database is locked
by another process" 500: SharedStores::new() acquired the writer lock before
the .codesearch.db directory existed, so opening .writer.lock failed with
"path not found". This rolled back the repos.json registration and made the CLI
fall back to a local duplicate index instead of delegating to serve.

- acquire_writer_lock / SharedStores::new now create the DB directory first;
  genuine I/O errors surface distinctly instead of as a lock conflict.
- Serve config writes route through ServeState::persist_config() (honors the
  config path override) β€” production behavior unchanged, register/remove path
  now hermetically testable.
- Regression guards exercise the brand-new-repo create/register path with the
  DB directory genuinely absent (verified to fail against the pre-fix code).
- CHANGELOG: 1.0.136.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: never silently create a local duplicate index when serve is busy

The CLI probes serve's /health before delegating `index`/`index add`. Any
health failure β€” including a *timeout* while serve is warming up its repos at
startup β€” was classified as "serve not running", so the CLI silently created a
local index. That local index is a duplicate serve does not manage and can
cause LMDB file-lock conflicts (and the repo never gets registered with serve).

New behavior via probe_serve_health():
- Responsive  -> delegate as before.
- Connection refused / cannot connect -> serve not running; index locally.
  Detected immediately (no timeout elapses, no retries), so the common
  "no serve -> local" path is NOT slowed down.
- Listening but unresponsive (timeout, retried briefly) -> serve is up but
  busy. The CLI now REFUSES to create a local duplicate and tells the user to
  retry shortly or stop serve first. The fallback is never silent anymore.

Delegation errors are now typed (DelegateError: ServeDown / ServeUnresponsive /
Failed) instead of string-matched. Applies to `index` and `index add` (the
index-creating paths); `index rm` is unchanged.

Tests: probe classification guards (responsive -> Up; listening-but-slow ->
Unresponsive). Rolls into the 1.0.137 release together with the writer-lock fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: align develop with master (post-v1.0.137 release) (#78)

Brings the two files that drifted on master during the v1.0.137 release back
to develop: the updated protect-master.yml (allows release/* branches) and the
CHANGELOG [1.0.135] entry. After this, develop and master trees are identical.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: strip UNC prefix in repos.json + auto-add on missing DB (v1.0.138) (#79)

Two Windows path-handling bugs that caused spurious "Database not found"
errors and local duplicate indexes:

1. register()/register_with_alias() stored the raw canonicalize() result in
   repos.json. On Windows, canonicalize() returns \?\C:\... (extended-length
   UNC prefix). Downstream .join(".codesearch.db") and Path::exists() calls
   then fail inconsistently (\?\C:\foo\.codesearch.db not found even when
   C:\foo\.codesearch.db exists). 7 repos were affected. Fix: strip_unc()
   removes the prefix before storage. Existing repos.json patched in-place.
   Regression test: register_strips_unc_prefix_from_stored_path.

2. 500 "Database not found" from reindex (alias registered but DB gone) was
   treated as a generic failure -> local fallback -> duplicate index. Fix:
   triggers the same auto-register POST /repos path as 404 (DB recreated by
   serve, no local fallback).

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: align develop with master (post-v1.0.138) (#81)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* refactor: central safe_canonicalize() β€” eliminate raw .canonicalize() across codebase (#82)

ROOT CAUSE OF RECURRING BUG CLASS
Path::canonicalize() on Windows returns \?\C:\... (extended-length UNC
prefix). Any downstream .join(), .exists(), or HashMap key built from that
path behaves inconsistently β€” the sub-path \?\C:\foo\.codesearch.db may
return false from exists() even when C:\foo\.codesearch.db is present.
This class of bug has silently broken registrations multiple times.

FIX
Introduce safe_canonicalize(path: &Path) -> io::Result<PathBuf> and
strip_unc_prefix(path: PathBuf) -> PathBuf in src/cache/file_meta.rs.
These are the ONLY approved way to canonicalize paths in this codebase.
Exported via crate::cache.

CALL SITES UPDATED (all raw .canonicalize() removed)
- src/cache/file_meta.rs      β€” central definition + 5 new regression tests
- src/db_discovery/repos.rs   β€” register, register_with_alias, unregister_path,
                                alias_for_path; local strip_unc() removed
- src/db_discovery/mod.rs     β€” find_best_database, get_db_path_for_cwd
- src/index/mod.rs            β€” find_git_root, get_global_db_path,
                                add_to_index, remove_from_index,
                                try_delegate_reindex_to_serve (x2),
                                try_delegate_rm_to_serve
- src/lmdb_registry.rs        β€” TrackedEnv registry key (eliminates
                                double-open risk when same dir accessed
                                with and without \?\ prefix)
- src/serve/mod.rs            β€” add_repo_handler, run_serve --register path

POLICY DOCUMENTED
AGENTS.md: "⚠️ Canonical Path Policy β€” MANDATORY" section with rule,
code example, and pointer to regression tests.

REGRESSION TESTS (6 new in cache/file_meta.rs + 1 existing in repos.rs)
- strip_unc_prefix_removes_windows_unc
- strip_unc_prefix_is_idempotent_on_{plain_path,unix_path}
- safe_canonicalize_on_existing_dir_returns_plain_path
- safe_canonicalize_on_nonexistent_path_returns_error
- register_strips_unc_prefix_from_stored_path (repos.rs β€” verifies
  fallback path also strips UNC when canonicalize() fails)

407 lib tests pass. clippy -D warnings clean.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: remove stale strip_unc() from repos.rs (merged from master, superseded by central safe_canonicalize)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Release v1.0.140 β€” last raw .canonicalize() eliminated (#85)

* fix: CI test resilience + protect-master workflow (#58)

* Sync master β†’ develop (tree-sitter) (#60)

* Fix formatting of codesearch index command

* Create codeql.yml

* feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON

* fix: CI test resilience + protect-master workflow (#58) (#59)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (#61)

* [worker] cleanup: AGENTS.md β€” 73% reduction, removed stale test report and duplicate bug details

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (v1.0.97...v1.0.132)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* sync: align develop with master β€” AGENTS.md, Cargo.toml, Cargo.lock (#63)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* fix: MCP local mode project/group fallback + QC script fix (#66)

* fix(mcp): ignore project/group params in local/stdio mode instead of erroring

When running MCP in local mode (no serve_state), project/group routing is meaningless because only one DB is available.

Log a warning and fall back to local DB instead of returning an error.

* fix(qc): define YELLOW color in bash QC script

* chore: simplify release workflow β€” feature-only version bump (#74)

* fix: serve-aware indexing β€” create DB dir before lock + no silent local duplicate (v1.0.137) (#76)

* fix: create DB directory before acquiring writer lock (serve auto-register)

When `serve` is running and `codesearch index` is run for a repo not yet known
to it, auto-register (POST /repos) failed with a misleading "Database is locked
by another process" 500: SharedStores::new() acquired the writer lock before
the .codesearch.db directory existed, so opening .writer.lock failed with
"path not found". This rolled back the repos.json registration and made the CLI
fall back to a local duplicate index instead of delegating to serve.

- acquire_writer_lock / SharedStores::new now create the DB directory first;
  genuine I/O errors surface distinctly instead of as a lock conflict.
- Serve config writes route through ServeState::persist_config() (honors the
  config path override) β€” production behavior unchanged, register/remove path
  now hermetically testable.
- Regression guards exercise the brand-new-repo create/register path with the
  DB directory genuinely absent (verified to fail against the pre-fix code).
- CHANGELOG: 1.0.136.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: never silently create a local duplicate index when serve is busy

The CLI probes serve's /health before delegating `index`/`index add`. Any
health failure β€” including a *timeout* while serve is warming up its repos at
startup β€” was classified as "serve not running", so the CLI silently created a
local index. That local index is a duplicate serve does not manage and can
cause LMDB file-lock conflicts (and the repo never gets registered with serve).

New behavior via probe_serve_health():
- Responsive  -> delegate as before.
- Connection refused / cannot connect -> serve not running; index locally.
  Detected immediately (no timeout elapses, no retries), so the common
  "no serve -> local" path is NOT slowed down.
- Listening but unresponsive (timeout, retried briefly) -> serve is up but
  busy. The CLI now REFUSES to create a local duplicate and tells the user to
  retry shortly or stop serve first. The fallback is never silent anymore.

Delegation errors are now typed (DelegateError: ServeDown / ServeUnresponsive /
Failed) instead of string-matched. Applies to `index` and `index add` (the
index-creating paths); `index rm` is unchanged.

Tests: probe classification guards (responsive -> Up; listening-but-slow ->
Unresponsive). Rolls into the 1.0.137 release together with the writer-lock fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: align develop with master (post-v1.0.137 release) (#78)

Brings the two files that drifted on master during the v1.0.137 release back
to develop: the updated protect-master.yml (allows release/* branches) and the
CHANGELOG [1.0.135] entry. After this, develop and master trees are identical.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: strip UNC prefix in repos.json + auto-add on missing DB (v1.0.138) (#79)

Two Windows path-handling bugs that caused spurious "Database not found"
errors and local duplicate indexes:

1. register()/register_with_alias() stored the raw canonicalize() result in
   repos.json. On Windows, canonicalize() returns \?\C:\... (extended-length
   UNC prefix). Downstream .join(".codesearch.db") and Path::exists() calls
   then fail inconsistently (\?\C:\foo\.codesearch.db not found even when
   C:\foo\.codesearch.db exists). 7 repos were affected. Fix: strip_unc()
   removes the prefix before storage. Existing repos.json patched in-place.
   Regression test: register_strips_unc_prefix_from_stored_path.

2. 500 "Database not found" from reindex (alias registered but DB gone) was
   treated as a generic failure -> local fallback -> duplicate index. Fix:
   triggers the same auto-register POST /repos path as 404 (DB recreated by
   serve, no local fallback).

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: align develop with master (post-v1.0.138) (#81)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* refactor: central safe_canonicalize() β€” eliminate raw .canonicalize() across codebase (#82)

ROOT CAUSE OF RECURRING BUG CLASS
Path::canonicalize() on Windows returns \?\C:\... (extended-length UNC
prefix). Any downstream .join(), .exists(), or HashMap key built from that
path behaves inconsistently β€” the sub-path \?\C:\foo\.codesearch.db may
return false from exists() even when C:\foo\.codesearch.db is present.
This class of bug has silently broken registrations multiple times.

FIX
Introduce safe_canonicalize(path: &Path) -> io::Result<PathBuf> and
strip_unc_prefix(path: PathBuf) -> PathBuf in src/cache/file_meta.rs.
These are the ONLY approved way to canonicalize paths in this codebase.
Exported via crate::cache.

CALL SITES UPDATED (all raw .canonicalize() removed)
- src/cache/file_meta.rs      β€” central definition + 5 new regression tests
- src/db_discovery/repos.rs   β€” register, register_with_alias, unregister_path,
                                alias_for_path; local strip_unc() removed
- src/db_discovery/mod.rs     β€” find_best_database, get_db_path_for_cwd
- src/index/mod.rs            β€” find_git_root, get_global_db_path,
                                add_to_index, remove_from_index,
                                try_delegate_reindex_to_serve (x2),
                                try_delegate_rm_to_serve
- src/lmdb_registry.rs        β€” TrackedEnv registry key (eliminates
                                double-open risk when same dir accessed
                                with and without \?\ prefix)
- src/serve/mod.rs            β€” add_repo_handler, run_serve --register path

POLICY DOCUMENTED
AGENTS.md: "⚠️ Canonical Path Policy β€” MANDATORY" section with rule,
code example, and pointer to regression tests.

REGRESSION TESTS (6 new in cache/file_meta.rs + 1 existing in repos.rs)
- strip_unc_prefix_removes_windows_unc
- strip_unc_prefix_is_idempotent_on_{plain_path,unix_path}
- safe_canonicalize_on_existing_dir_returns_plain_path
- safe_canonicalize_on_nonexistent_path_returns_error
- register_strips_unc_prefix_from_stored_path (repos.rs β€” verifies
  fallback path also strips UNC when canonicalize() fails)

407 lib tests pass. clippy -D warnings clean.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: replace last raw .canonicalize() with safe_canonicalize in get_db_path_smart (#84)

The old normalize_path(&p.canonicalize()...) pattern in get_db_path_smart
was missed in the central safe_canonicalize refactor (v1.0.139). It worked
correctly (normalize_path also strips UNC) but was inconsistent with the
policy. Now all .canonicalize() calls outside safe_canonicalize's own
definition are eliminated.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Release v1.0.141 β€” serve warmup wait + 409 DB-recreate fix (#87)

* fix: CI test resilience + protect-master workflow (#58)

* Sync master β†’ develop (tree-sitter) (#60)

* Fix formatting of codesearch index command

* Create codeql.yml

* feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON

* fix: CI test resilience + protect-master workflow (#58) (#59)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (#61)

* [worker] cleanup: AGENTS.md β€” 73% reduction, removed stale test report and duplicate bug details

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (v1.0.97...v1.0.132)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* sync: align develop with master β€” AGENTS.md, Cargo.toml, Cargo.lock (#63)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* fix: MCP local mode project/group fallback + QC script fix (#66)

* fix(mcp): ignore project/group params in local/stdio mode instead of erroring

When running MCP in local mode (no serve_state), project/group routing is meaningless because only one DB is available.

Log a warning and fall back to local DB instead of returning an error.

* fix(qc): define YELLOW color in bash QC script

* chore: simplify release workflow β€” feature-only version bump (#74)

* fix: serve-aware indexing β€” create DB dir before lock + no silent local duplicate (v1.0.137) (#76)

* fix: create DB directory before acquiring writer lock (serve auto-register)

When `serve` is running and `codesearch index` is run for a repo not yet known
to it, auto-register (POST /repos) failed with a misleading "Database is locked
by another process" 500: SharedStores::new() acquired the writer lock before
the .codesearch.db directory existed, so opening .writer.lock failed with
"path not found". This rolled back the repos.json registration and made the CLI
fall back to a local duplicate index instead of delegating to serve.

- acquire_writer_lock / SharedStores::new now create the DB directory first;
  genuine I/O errors surface distinctly instead of as a lock conflict.
- Serve config writes route through ServeState::persist_config() (honors the
  config path override) β€” production behavior unchanged, register/remove path
  now hermetically testable.
- Regression guards exercise the brand-new-repo create/register path with the
  DB directory genuinely absent (verified to fail against the pre-fix code).
- CHANGELOG: 1.0.136.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: never silently create a local duplicate index when serve is busy

The CLI probes serve's /health before delegating `index`/`index add`. Any
health failure β€” including a *timeout* while serve is warming up its repos at
startup β€” was classified as "serve not running", so the CLI silently created a
local index. That local index is a duplicate serve does not manage and can
cause LMDB file-lock conflicts (and the repo never gets registered with serve).

New behavior via probe_serve_health():
- Responsive  -> delegate as before.
- Connection refused / cannot connect -> serve not running; index locally.
  Detected immediately (no timeout elapses, no retries), so the common
  "no serve -> local" path is NOT slowed down.
- Listening but unresponsive (timeout, retried briefly) -> serve is up but
  busy. The CLI now REFUSES to create a local duplicate and tells the user to
  retry shortly or stop serve first. The fallback is never silent anymore.

Delegation errors are now typed (DelegateError: ServeDown / ServeUnresponsive /
Failed) instead of string-matched. Applies to `index` and `index add` (the
index-creating paths); `index rm` is unchanged.

Tests: probe classification guards (responsive -> Up; listening-but-slow ->
Unresponsive). Rolls into the 1.0.137 release together with the writer-lock fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: align develop with master (post-v1.0.137 release) (#78)

Brings the two files that drifted on master during the v1.0.137 release back
to develop: the updated protect-master.yml (allows release/* branches) and the
CHANGELOG [1.0.135] entry. After this, develop and master trees are identical.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: strip UNC prefix in repos.json + auto-add on missing DB (v1.0.138) (#79)

Two Windows path-handling bugs that caused spurious "Database not found"
errors and local duplicate indexes:

1. register()/register_with_alias() stored the raw canonicalize() result in
   repos.json. On Windows, canonicalize() returns \?\C:\... (extended-length
   UNC prefix). Downstream .join(".codesearch.db") and Path::exists() calls
   then fail inconsistently (\?\C:\foo\.codesearch.db not found even when
   C:\foo\.codesearch.db exists). 7 repos were affected. Fix: strip_unc()
   removes the prefix before storage. Existing repos.json patched in-place.
   Regression test: register_strips_unc_prefix_from_stored_path.

2. 500 "Database not found" from reindex (alias registered but DB gone) was
   treated as a generic failure -> local fallback -> duplicate index. Fix:
   triggers the same auto-register POST /repos path as 404 (DB recreated by
   serve, no local fallback).

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: align develop with master (post-v1.0.138) (#81)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* refactor: central safe_canonicalize() β€” eliminate raw .canonicalize() across codebase (#82)

ROOT CAUSE OF RECURRING BUG CLASS
Path::canonicalize() on Windows returns \?\C:\... (extended-length UNC
prefix). Any downstream .join(), .exists(), or HashMap key built from that
path behaves inconsistently β€” the sub-path \?\C:\foo\.codesearch.db may
return false from exists() even when C:\foo\.codesearch.db is present.
This class of bug has silently broken registrations multiple times.

FIX
Introduce safe_canonicalize(path: &Path) -> io::Result<PathBuf> and
strip_unc_prefix(path: PathBuf) -> PathBuf in src/cache/file_meta.rs.
These are the ONLY approved way to canonicalize paths in this codebase.
Exported via crate::cache.

CALL SITES UPDATED (all raw .canonicalize() removed)
- src/cache/file_meta.rs      β€” central definition + 5 new regression tests
- src/db_discovery/repos.rs   β€” register, register_with_alias, unregister_path,
                                alias_for_path; local strip_unc() removed
- src/db_discovery/mod.rs     β€” find_best_database, get_db_path_for_cwd
- src/index/mod.rs            β€” find_git_root, get_global_db_path,
                                add_to_index, remove_from_index,
                                try_delegate_reindex_to_serve (x2),
                                try_delegate_rm_to_serve
- src/lmdb_registry.rs        β€” TrackedEnv registry key (eliminates
                                double-open risk when same dir accessed
                                with and without \?\ prefix)
- src/serve/mod.rs            β€” add_repo_handler, run_serve --register path

POLICY DOCUMENTED
AGENTS.md: "⚠️ Canonical Path Policy β€” MANDATORY" section with rule,
code example, and pointer to regression tests.

REGRESSION TESTS (6 new in cache/file_meta.rs + 1 existing in repos.rs)
- strip_unc_prefix_removes_windows_unc
- strip_unc_prefix_is_idempotent_on_{plain_path,unix_path}
- safe_canonicalize_on_existing_dir_returns_plain_path
- safe_canonicalize_on_nonexistent_path_returns_error
- register_strips_unc_prefix_from_stored_path (repos.rs β€” verifies
  fallback path also strips UNC when canonicalize() fails)

407 lib tests pass. clippy -D warnings clean.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: replace last raw .canonicalize() with safe_canonicalize in get_db_path_smart (#84)

The old normalize_path(&p.canonicalize()...) pattern in get_db_path_smart
was missed in the central safe_canonicalize refactor (v1.0.139). It worked
correctly (normalize_path also strips UNC) but was inconsistent with the
policy. Now all .canonicalize() calls outside safe_canonicalize's own
definition are eliminated.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: wait for serve warmup instead of refusing; fix 409 on DB-recreate (#86)

PROBLEM 1 β€” ServeUnresponsive aborted with error instead of waiting
When serve is warming up (opening LMDB for 15+ repos blocks the tokio
runtime, causing /health to time out), the CLI refused with an error.
The user had to retry manually.

FIX: serve_delegate_with_warmup_wait() wraps both try_delegate_reindex_to_serve
and try_delegate_add_to_serve. On ServeUnresponsive it prints
"⏳ serve is starting up, waiting..." and retries every 8s up to 6 times
(~2 min budget). On success it prints "βœ… serve is ready, delegating...".
Only exhausting the full budget returns an error.

PROBLEM 2 β€” 409 Conflict from POST /repos on "Database not found" path
When a registered repo's DB was missing, the CLI tried POST /repos to
recreate it. Serve correctly returned 409 (alias already registered).
The CLI treated 409 as a failure and fell back to local indexing.

FIX: when auto-add returns 409, retry as POST /repos/{alias}/reindex?force=true.
Force reindex uses allow_create=true and creates the DB via serve without
local fallback.

AGENTS.md: document the root cause (tokio blocking during warmup) as a
remaining work item with diagnosis and fix guidance.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Release v1.0.142 β€” serve responsive during warmup (spawn_blocking) (#89)

* fix: CI test resilience + protect-master workflow (#58)

* Sync master β†’ develop (tree-sitter) (#60)

* Fix formatting of codesearch index command

* Create codeql.yml

* feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON

* fix: CI test resilience + protect-master workflow (#58) (#59)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (#61)

* [worker] cleanup: AGENTS.md β€” 73% reduction, removed stale test report and duplicate bug details

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (v1.0.97...v1.0.132)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* sync: align develop with master β€” AGENTS.md, Cargo.toml, Cargo.lock (#63)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* fix: MCP local mode project/group fallback + QC script fix (#66)

* fix(mcp): ignore project/group params in local/stdio mode instead of erroring

When running MCP in local mode (no serve_state), project/group routing is meaningless because only one DB is available.

Log a warning and fall back to local DB instead of returning an error.

* fix(qc): define YELLOW color in bash QC script

* chore: simplify release workflow β€” feature-only version bump (#74)

* fix: serve-aware indexing β€” create DB dir before lock + no silent local duplicate (v1.0.137) (#76)

* fix: create DB directory before acquiring writer lock (serve auto-register)

When `serve` is running and `codesearch index` is run for a repo not yet known
to it, auto-register (POST /repos) failed with a misleading "Database is locked
by another process" 500: SharedStores::new() acquired the writer lock before
the .codesearch.db directory existed, so opening .writer.lock failed with
"path not found". This rolled back the repos.json registration and made the CLI
fall back to a local duplicate index instead of delegating to serve.

- acquire_writer_lock / SharedStores::new now create the DB directory first;
  genuine I/O errors surface distinctly instead of as a lock conflict.
- Serve config writes route through ServeState::persist_config() (honors the
  config path override) β€” production behavior unchanged, register/remove path
  now hermetically testable.
- Regression guards exercise the brand-new-repo create/register path with the
  DB directory genuinely absent (verified to fail against the pre-fix code).
- CHANGELOG: 1.0.136.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: never silently create a local duplicate index when serve is busy

The CLI probes serve's /health before delegating `index`/`index add`. Any
health failure β€” including a *timeout* while serve is warming up its repos at
startup β€” was classified as "serve not running", so the CLI silently created a
local index. That local index is a duplicate serve does not manage and can
cause LMDB file-lock conflicts (and the repo never gets registered with serve).

New behavior via probe_serve_health():
- Responsive  -> delegate as before.
- Connection refused / cannot connect -> serve not running; index locally.
  Detected immediately (no timeout elapses, no retries), so the common
  "no serve -> local" path is NOT slowed down.
- Listening but unresponsive (timeout, retried briefly) -> serve is up but
  busy. The CLI now REFUSES to create a local duplicate and tells the user to
  retry shortly or stop serve first. The fallback is never silent anymore.

Delegation errors are now typed (DelegateError: ServeDown / ServeUnresponsive /
Failed) instead of string-matched. Applies to `index` and `index add` (the
index-creating paths); `index rm` is unchanged.

Tests: probe classification guards (responsive -> Up; listening-but-slow ->
Unresponsive). Rolls into the 1.0.137 release together with the writer-lock fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: align develop with master (post-v1.0.137 release) (#78)

Brings the two files that drifted on master during the v1.0.137 release back
to develop: the updated protect-master.yml (allows release/* branches) and the
CHANGELOG [1.0.135] entry. After this, develop and master trees are identical.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: strip UNC prefix in repos.json + auto-add on missing DB (v1.0.138) (#79)

Two Windows path-handling bugs that caused spurious "Database not found"
errors and local duplicate indexes:

1. register()/register_with_alias() stored the raw canonicalize() result in
   repos.json. On Windows, canonicalize() returns \?\C:\... (extended-length
   UNC prefix). Downstream .join(".codesearch.db") and Path::exists() calls
   then fail inconsistently (\?\C:\foo\.codesearch.db not found even when
   C:\foo\.codesearch.db exists). 7 repos were affected. Fix: strip_unc()
   removes the prefix before storage. Existing repos.json patched in-place.
   Regression test: register_strips_unc_prefix_from_stored_path.

2. 500 "Database not found" from reindex (alias registered but DB gone) was
   treated as a generic failure -> local fallback -> duplicate index. Fix:
   triggers the same auto-register POST /repos path as 404 (DB recreated by
   serve, no local fallback).

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: align develop with master (post-v1.0.138) (#81)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* refactor: central safe_canonicalize() β€” eliminate raw .canonicalize() across codebase (#82)

ROOT CAUSE OF RECURRING BUG CLASS
Path::canonicalize() on Windows returns \?\C:\... (extended-length UNC
prefix). Any downstream .join(), .exists(), or HashMap key built from that
path behaves inconsistently β€” the sub-path \?\C:\foo\.codesearch.db may
return false from exists() even when C:\foo\.codesearch.db is present.
This class of bug has silently broken registrations multiple times.

FIX
Introduce safe_canonicalize(path: &Path) -> io::Result<PathBuf> and
strip_unc_prefix(path: PathBuf) -> PathBuf in src/cache/file_meta.rs.
These are the ONLY approved way to canonicalize paths in this codebase.
Exported via crate::cache.

CALL SITES UPDATED (all raw .canonicalize() removed)
- src/cache/file_meta.rs      β€” central definition + 5 new regression tests
- src/db_discovery/repos.rs   β€” register, register_with_alias, unregister_path,
                                alias_for_path; local strip_unc() removed
- src/db_discovery/mod.rs     β€” find_best_database, get_db_path_for_cwd
- src/index/mod.rs            β€” find_git_root, get_global_db_path,
                                add_to_index, remove_from_index,
                                try_delegate_reindex_to_serve (x2),
                                try_delegate_rm_to_serve
- src/lmdb_registry.rs        β€” TrackedEnv registry key (eliminates
                                double-open risk when same dir accessed
                                with and without \?\ prefix)
- src/serve/mod.rs            β€” add_repo_handler, run_serve --register path

POLICY DOCUMENTED
AGENTS.md: "⚠️ Canonical Path Policy β€” MANDATORY" section with rule,
code example, and pointer to regression tests.

REGRESSION TESTS (6 new in cache/file_meta.rs + 1 existing in repos.rs)
- strip_unc_prefix_removes_windows_unc
- strip_unc_prefix_is_idempotent_on_{plain_path,unix_path}
- safe_canonicalize_on_existing_dir_returns_plain_path
- safe_canonicalize_on_nonexistent_path_returns_error
- register_strips_unc_prefix_from_stored_path (repos.rs β€” verifies
  fallback path also strips UNC when canonicalize() fails)

407 lib tests pass. clippy -D warnings clean.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: replace last raw .canonicalize() with safe_canonicalize in get_db_path_smart (#84)

The old normalize_path(&p.canonicalize()...) pattern in get_db_path_smart
was missed in the central safe_canonicalize refactor (v1.0.139). It worked
correctly (normalize_path also strips UNC) but was inconsistent with the
policy. Now all .canonicalize() calls outside safe_canonicalize's own
definition are eliminated.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: wait for serve warmup instead of refusing; fix 409 on DB-recreate (#86)

PROBLEM 1 β€” ServeUnresponsive aborted with error instead of waiting
When serve is warming up (opening LMDB for 15+ repos blocks the tokio
runtime, causing /health to time out), the CLI refused with an error.
The user had to retry manually.

FIX: serve_delegate_with_warmup_wait() wraps both try_delegate_reindex_to_serve
and try_delegate_add_to_serve. On ServeUnresponsive it prints
"⏳ serve is starting up, waiting..." and retries every 8s up to 6 times
(~2 min budget). On success it prints "βœ… serve is ready, delegating...".
Only exhausting the full budget returns an error.

PROBLEM 2 β€” 409 Conflict from POST /repos on "Database not found" path
When a registered repo's DB was missing, the CLI tried POST /repos to
recreate it. Serve correctly returned 409 (alias already registered).
The CLI treated 409 as a failure and fell back to local indexing.

FIX: when auto-add returns 409, retry as POST /repos/{alias}/reindex?force=true.
Force reindex uses allow_create=true and creates the DB via serve without
local fallback.

AGENTS.md: document the root cause (tokio blocking during warmup) as a
remaining work item with diagnosis and fix guidance.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: keep serve responsive during warmup β€” offload heavy work to spawn_blocking (#88)

PROBLEM
codesearch serve became unresponsive during startup warmup: FileWalker::walk,
VectorStore::build_index (HNSW), and fastembed/ONNX embedding (saturates all
cores) ran synchronously on tokio worker threads. This starved the async
runtime, /health timed out (>3s), and `codesearch index` reported "serve did
not respond in time". The server already returns 202 + spawns background
indexing (accept-and-defer); it just couldn't respond while warming.

FIX
Offload the heavy synchronous warmup work to tokio::task::spawn_blocking, so
the async executor stays responsive (answers /health and accepts POST /repos
immediately, runs the job in the background).
- serve/mod.rs warmup_repo: read stats under .read(); build_index via
  spawn_blocking + Arc clone + blocking_write. Build failure only warns.
- manager.rs perform_incremental_refresh_with_stores: walk, read+chunk+embed,
  and build_index all offloaded.
- manager.rs refresh_index_with_stores: walk + both build_index calls offloaded.

LOCK SAFETY (verified by review)
Every async RwLock guard scope CLOSES before the spawn_blocking that calls
.blocking_write() on the same store β€” no lock-over-await deadlock. blocking_write
is only ever called inside spawn_blocking (never on an async worker).

Test: test_incremental_refresh_up_to_date_is_noop exercises the refactored walk
path. 408 lib tests pass, clippy -D warnings clean.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Release v1.0.154 β€” markdown chunking, stale-path relocation, auto-prune, 15 languages (#99)

* fix: CI test resilience + protect-master workflow (#58)

* Sync master β†’ develop (tree-sitter) (#60)

* Fix formatting of codesearch index command

* Create codeql.yml

* feat: add tree-sitter grammars for Bash, Ruby, PHP, YAML, JSON

* fix: CI test resilience + protect-master workflow (#58) (#59)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (#61)

* [worker] cleanup: AGENTS.md β€” 73% reduction, removed stale test report and duplicate bug details

* docs: update CHANGELOG β€” v1.0.132 consolidated release notes (v1.0.97...v1.0.132)

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* sync: align develop with master β€” AGENTS.md, Cargo.toml, Cargo.lock (#63)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* fix: MCP local mode project/group fallback + QC script fix (#66)

* fix(mcp): ignore project/group params in local/stdio mode instead of erroring

When running MCP in local mode (no serve_state), project/group routing is meaningless because only one DB is available.

Log a warning and fall back to local DB instead of returning an error.

* fix(qc): define YELLOW color in bash QC script

* chore: simplify release workflow β€” feature-only version bump (#74)

* fix: serve-aware indexing β€” create DB dir before lock + no silent local duplicate (v1.0.137) (#76)

* fix: create DB directory before acquiring writer lock (serve auto-register)

When `serve` is running and `codesearch index` is run for a repo not yet known
to it, auto-register (POST /repos) failed with a misleading "Database is locked
by another process" 500: SharedStores::new() acquired the writer lock before
the .codesearch.db directory existed, so opening .writer.lock failed with
"path not found". This rolled back the repos.json registration and made the CLI
fall back to a local duplicate index instead of delegating to serve.

- acquire_writer_lock / SharedStores::new now create the DB directory first;
  genuine I/O errors surface distinctly instead of as a lock conflict.
- Serve config writes route through ServeState::persist_config() (honors the
  config path override) β€” production behavior unchanged, register/remove path
  now hermetically testable.
- Regression guards exercise the brand-new-repo create/register path with the
  DB directory genuinely absent (verified to fail against the pre-fix code).
- CHANGELOG: 1.0.136.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: never silently create a local duplicate index when serve is busy

The CLI probes serve's /health before delegating `index`/`index add`. Any
health failure β€” including a *timeout* while serve is warming up its repos at
startup β€” was classified as "serve not running", so the CLI silently created a
local index. That local index is a duplicate serve does not manage and can
cause LMDB file-lock conflicts (and the repo never gets registered with serve).

New behavior via probe_serve_health():
- Responsive  -> delegate as before.
- Connection refused / cannot connect -> serve not running; index locally.
  Detected immediately (no timeout elapses, no retries), so the common
  "no serve -> local" path is NOT slowed down.
- Listening but unresponsive (timeout, retried briefly) -> serve is up but
  busy. The CLI now REFUSES to create a local duplicate and tells the user to
  retry shortly or stop serve first. The fallback is never silent anymore.

Delegation errors are now typed (DelegateError: ServeDown / ServeUnresponsive /
Failed) instead of string-matched. Applies to `index` and `index add` (the
index-creating paths); `index rm` is unchanged.

Tests: probe classification guards (responsive -> Up; listening-but-slow ->
Unresponsive). Rolls into the 1.0.137 release together with the writer-lock fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: align develop with master (post-v1.0.137 release) (#78)

Brings the two files that drifted on master during the v1.0.137 release back
to develop: the updated protect-master.yml (allows release/* branches) and the
CHANGELOG [1.0.135] entry. After this, develop and master trees are identical.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: strip UNC prefix in repos.json + auto-add on missing DB (v1.0.138) (#79)

Two Windows path-handling bugs that caused spurious "Database not found"
errors and local duplicate indexes:

1. register()/register_with_alias() stored the raw canonicalize() result in
   repos.json. On Windows, canonicalize() returns \?\C:\... (extended-length
   UNC prefix). Downstream .join(".codesearch.db") and Path::exists() calls
   then fail inconsistently (\?\C:\foo\.codesearch.db not found even when
   C:\foo\.codesearch.db exists). 7 repos were affected. Fix: strip_unc()
   removes the prefix before storage. Existing repos.json patched in-place.
   Regression test: register_strips_unc_prefix_from_stored_path.

2. 500 "Database not found" from reindex (alias registered but DB gone) was
   treated as a generic failure -> local fallback -> duplicate index. Fix:
   triggers the same auto-register POST /repos path as 404 (DB recreated by
   serve, no local fallback).

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: align develop with master (post-v1.0.138) (#81)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* refactor: central safe_canonicalize() β€” eliminate raw .canonicalize() across codebase (#82)

ROOT CAUSE OF RECURRING BUG CLASS
Path::canonicalize() on Windows returns \?\C:\... (extended-length UNC
prefix). Any downstream .join(), .exists(), or HashMap key built from that
path behaves inconsistently β€” the sub-path \?\C:\foo\.codesearch.db may
return false from exists() even when C:\foo\.codesearch.db is present.
This class of bug has silently broken registrations multiple times.

FIX
Introduce safe_canonicalize(path: &Path) -> io::Result<PathBuf> and
strip_unc_prefix(path: PathBuf) -> PathBuf in src/cache/file_meta.rs.
These are the ONLY approved way to canonicalize paths in this codebase.
Exported via crate::cache.

CALL SITES UPDATED (all raw .canonicalize() removed)
- src/cache/file_meta.rs      β€” central definition + 5 new regression tests
- src/db_discovery/repos.rs   β€” register, register_with_alias, unregister_path,
                                alias_for_path; local strip_unc() removed
- src/db_discovery/mod.rs     β€” find_best_database, get_db_path_for_cwd
- src/index/mod.rs            β€” find_git_root, get_global_db_path,
                                add_to_index, remove_from_index,
                                try_delegate_reindex_to_serve (x2),
                                try_delegate_rm_to_serve
- src/lmdb_registry.rs        β€” TrackedEnv registry key (eliminates
                                double-open risk when same dir accessed
                                with and without \?\ prefix)
- src/serve/mod.rs            β€” add_repo_handler, run_serve --register path

POLICY DOCUMENTED
AGENTS.md: "⚠️ Canonical Path Policy β€” MANDATORY" section with rule,
code example, and pointer to regression tests.

REGRESSION TESTS (6 new in cache/file_meta.rs + 1 existing in repos.rs)
- strip_unc_prefix_removes_windows_unc
- strip_unc_prefix_is_idempotent_on_{plain_path,unix_path}
- safe_canonicalize_on_existing_dir_returns_plain_path
- safe_canonicalize_on_nonexistent_path_returns_error
- register_strips_unc_prefix_from_stored_path (repos.rs β€” verifies
  fallback path also strips UNC when canonicalize() fails)

407 lib tests pass. clippy -D warnings clean.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: replace last raw .canonicalize() with safe_canonicalize in get_db_path_smart (#84)

The old normalize_path(&p.canonicalize()...) pattern in get_db_path_smart
was missed in the central safe_canonicalize refactor (v1.0.139). It worked
correctly (normalize_path also strips UNC) but was inconsistent with the
policy. Now all .canonicalize() calls outside safe_canonicalize's own
definition are eliminated.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: wait for serve warmup instead of refusing; fix 409 on DB-recreate (#86)

PROBLEM 1 β€” ServeUnresponsive aborted with error instead of waiting
When serve is warming up (opening LMDB for 15+ repos blocks the tokio
runtime, causing /health to time out), the CLI refused with an error.
The user had to retry manually.

FIX: serve_delegate_with_warmup_wait() wraps both try_delegate_reindex_to_serve
and try_delegate_add_to_serve. On ServeUnresponsive it prints
"⏳ serve is starting up, waiting..." and retries every 8s up to 6 times
(~2 min budget). On success it prints "βœ… serve is ready, delegating...".
Only exhausting the full budget returns an error.

PROBLEM 2 β€” 409 Conflict from POST /repos on "Database not found" path
When a registered repo's DB was missing, the CLI tried POST /repos to
recreate it. Serve correctly returned 409 (alias already registered).
The CLI treated 409 as a failure and fell back to local indexing.

FIX: when auto-add returns 409, retry as POST /repos/{alias}/reindex?force=true.
Force reindex uses allow_create=true and creates the DB via serve without
local fallback.

AGENTS.md: document the root cause (tokio blocking during warmup) as a
remaining work item with diagnosis and fix guidance.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: keep serve responsive during warmup β€” offload heavy work to spawn_blocking (#88)

PROBLEM
codesearch serve became unresponsive during startup warmup: FileWalker::walk,
VectorStore::build_index (HNSW), and fastembed/ONNX embedding (saturates all
cores) ran synchronously on tokio worker threads. This starved the async
runtime, /health timed out (>3s), and `codesearch index` reported "serve did
not respond in time". The server already returns 202 + spawns background
indexing (accept-and-defer); it just couldn't respond while warming.

FIX
Offload the heavy synchronous warmup work to tokio::task::spawn_blocking, so
the async executor stays responsive (answers /health and accepts POST /repos
immediately, runs the job in the background).
- serve/mod.rs warmup_repo: read stats under .read(); build_index via
  spawn_blocking + Arc clone + blocking_write. Build failure only warns.
- manager.rs perform_incremental_refresh_with_stores: walk, read+chunk+embed,
  and build_index all offloaded.
- manager.rs refresh_index_with_stores: walk + both build_index calls offloaded.

LOCK SAFETY (verified by review)
Every async RwLock guard scope CLOSES before the spawn_blocking that calls
.blocking_write() on the same store β€” no lock-over-await deadlock. blocking_write
is only ever called inside spawn_blocking (never on an async worker).

Test: test_incremental_refresh_up_to_date_is_noop exercises the refactored walk
path. 408 lib tests pass, clippy -D warnings clean.

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sync: backfill CHANGELOG 1.0.139-1.0.142 from master (post-release) (#90)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>

* feat: semantic Markdown chunking (tree-sitter-md) + /merge & /release commands (#91)

* chore: add /merge and /release Claude Code slash commands

Codify the project release workflow as two committed slash commands under
.claude/commands/ (force-added past .gitignore, like .claude/CLAUDE.md):

- /merge: README/CHANGELOG freshness checks -> commit -> validate -> push ->
  PR to develop -> auto-merge after CI. No tag.
- /release: /merge, then promote develop -> master via a "Release vX.Y.Z" PR
  (protect-master allows develop), then push the vX.Y.Z tag that triggers
  release.yml. Includes optional post-release develop sync.

Commands document the repo's real conventions: feature->develop->master flow,
master branch protection, and the pre-commit version-bump-on-feature-branches
rule that fixes the release version at the feature commit.

Tooling-only change on a chore/ branch: no version bump, no CHANGELOG entry
(CHANGELOG tracks the shipped binary's behavior).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: address review remarks on /merge and /release commands

- /merge: abort unless on feature/*|features/*|fix/* (the only branches the
  pre-commit hook version-bumps) β€” closes the gap where running from a
  non-bumping branch silently broke the version/CHANGELOG premise.
- Clarify CHANGELOG heading version math for multi-commit landings (hook bumps
  +1 per commit; verify heading matches Cargo.toml after the final commit).
- Capture PR numbers explicitly (gh pr view --json number) before merge/poll.
- /release: fetch --tags and guard against a double release (stop if the tag
  already exists locally or on origin).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: document /merge and /release workflow in AGENTS.md

Add a Release workflow section describing the two slash commands, the
branch-protection rule, the tag-triggers-release.yml pipeline, and the
feature-branch-only version-bump rule that fixes the release version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(chunker): semantic Markdown chunking via tree-sitter-md

Markdown and .txt files were indexed as a single whole-file block (the
fallback chunker has no char budget), so a search hit returned an entire
page β€” real Aprimo docs reached 80 KB in one chunk.

Add the tree-sitter-md *block* grammar and chunk Markdown by heading
section instead: each chunk is one heading plus its own prose/code,
excluding nested subsections (which become their own chunks). The
heading path is carried in the breadcrumb context (File > Title >
Subsection) so embeddings capture each section's place in the document.

Also add split_oversized, a char- and li…
#101)

* fix: reconcile_all_paths in spawn_blocking + use persist_config in prune

Two correctness fixes flagged in post-release review:

1. reconcile_all_paths() was called synchronously inside tokio::spawn, blocking
   a Tokio worker thread while spawning git subprocesses and holding the config
   RwLock write-guard. Moved to spawn_blocking so the async runtime stays
   responsive during startup reconciliation.

2. Phase 1 auto-prune wrote repos.json via config.save() instead of
   self.persist_config(&config). All other ServeState save sites use
   persist_config to honour config_path_override (e.g. in tests). Now
   consistent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: add CHANGELOG entry for v1.0.156 (reconcile spawn_blocking + persist_config prune fix)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…doc-comments (#102)

* fix: doc-comments accuracy + safe_canonicalize in reload_if_changed

- repos.rs: correct "Pure (no disk I/O)" doc-comments on relocate_missing and
  prune_stale β€” both transitively call scan_for_remote which does read_dir and
  spawns a git subprocess; callers must use spawn_blocking in async contexts.
- serve/mod.rs: replace raw std::fs::canonicalize with safe_canonicalize in
  reload_if_changed so Windows UNC prefix (\?\) is stripped before comparison,
  consistent with the project-wide safe_canonicalize rule.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: index/mod.rs quality β€” extract ensure_hnsw_index_if_needed + tests + metadata consistency + cancel best-effort

- Extract the safety-net HNSW rebuild into ensure_hnsw_index_if_needed() so
  the logic is unit-testable; add 3 tests (unindexed with chunks rebuilds,
  already-indexed is idempotent, empty DB skips rebuild).
- metadata.json schema consistency: add "partial": false to the normal
  (non-cancelled) path so readers always see the field regardless of how
  indexing ended.
- Cancellation finalisation path: change non-critical ? propagations to
  log-and-continue (metadata.json write, FileMetaStore update/save, stats
  read) β€” keeps partial chunks searchable even if any recovery step fails.
  store.build_index() still propagates errors as before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: concurrency β€” evaluate_csharp_rebuild outside write lock + build_index in spawn_blocking

Three related concurrency fixes in serve/mod.rs, consistent with the
reconcile_all_paths fix from PR #101:

1. evaluate_csharp_rebuild: bootstrap_last_changed (git subprocess + ≀10k
   file fs-walk) was running while holding config.write(), blocking every
   concurrent config.read() for the scan duration. Fixed by checking whether
   bootstrap is needed under a read lock, running the slow I/O with no lock
   held, then taking the write lock only for the brief config update.

2. evaluate_csharp_rebuild call site in run_phase_2_csharp_scip: even with the
   above fix the function still ran synchronously on a Tokio worker thread.
   Wrapped in spawn_blocking so the async runtime stays responsive while
   scanning all C# candidates at startup.

3. warmup_repo and add_repo_handler background task: two build_index() calls
   were running directly on async threads while holding a tokio RwLock write
   guard. build_index() is CPU-heavy (HNSW construction). Both are now
   offloaded via spawn_blocking + blocking_write(), matching the established
   pattern at serve/mod.rs:1249.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: CHANGELOG entry for v1.0.160 (full review fixes)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: rename_retry helper for Windows flaky tests in repos.rs

On Windows, git subprocesses spawned by init_git_remote keep file handles
open briefly after the process exits, causing std::fs::rename and
std::fs::remove_dir_all to fail with "Access is denied" under parallel test
load. Fixes:
- Add rename_retry() test helper that retries with exponential back-off
  (up to 10 attempts, 20-200ms delays)
- Replace all 7 std::fs::rename(...).unwrap() calls in the repos test
  module with rename_retry()
- Change remove_dir_all(...).unwrap() in try_relocate_none_when_ambiguous
  to let _ = ... (the assertion holds either way)

Verified stable: 3 consecutive full-suite runs with 432 passed, 0 failed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: CHANGELOG entry for v1.0.162 (Windows flaky test fix)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (#108)

Co-authored-by: flupkede <flupkede@users.noreply.github.com>
Test User and others added 28 commits June 11, 2026 22:54
…FileWatcher missing .codesearchignore bug
…uage::Jupyter, and FileWatcher precedence
…ist (M6)

C1 (CRITICAL): perform_incremental_refresh_with_stores hardcoded
ModelType::default() for embedding while force_reindex wrote the --model
override only into metadata.json. Choosing a non-384d model (bge-base/
bge-large/mxbai-large) recorded the new dimension in metadata but still
produced 384d vectors -> dimension mismatch / corrupt index.

Fix: resolve the embedding model from the short name recorded in
metadata.json, fail fast on an unknown model or a dims/model mismatch,
and thread the resolved ModelType into the embedding closure instead of
the hardcoded default.

M6: replace the two hand-maintained "valid models" lists (CLI add --model
and serve POST /repos), which both omitted bge-large, with a single
ModelType::valid_short_names() derived from all().

Tests: short_name round-trips through parse() for every model;
valid_short_names() lists every model incl. bge-large.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er path

index_single_file (reached from the file-watcher loop and git branch-change
refresh) still hardcoded ModelType::default() for embedding, re-introducing
the exact dimension-mismatch corruption C1 set out to fix: a repo indexed
with a non-default model would re-embed changed files at 384d on the next
edit.

Extract resolve_embed_model(db_path) -> (ModelType, usize) as the single
source for model resolution (reads metadata, fail-fast on unknown model /
dims mismatch) and use it in BOTH perform_incremental_refresh_with_stores
and index_single_file. Removes the duplicated inline resolution block and
the redundant closure rebind.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tant-time auth (M2)

M1: the generated post-checkout hook embedded $(pwd) directly into the JSON
request body. A repo path containing a double quote or backslash broke out of
the JSON string literal (malformed body / injection). Now JSON-escape the path
in pure bash (backslashes then quotes) before embedding β€” no jq dependency,
handles Windows paths.

M2: API key was compared with raw string `==`, which short-circuits on the
first mismatched byte β€” a timing side-channel on the network-exposed
require_auth_for_network path (whose "localhost-only, timing impractical"
justification was actually copied from the admin middleware). Add
api_key_matches() using SHA-256 digest + non-short-circuiting byte compare,
and a shared request_has_valid_api_key() helper so both middlewares use the
constant-time path from one place. Fix the misleading doc comment.

Test: api_key_matches covers match/mismatch/length/empty/case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… msys bash

The previous M1 fix used bare-backslash parameter expansion
(${REPO_PATH//\/\\}); empirically this does NOT double backslashes in
Git Bash / msys 5.2 (the surrounding double-quotes halve them and a bare
\ pattern fails to match a literal backslash), so a path with a backslash
still produced invalid JSON.

Use the variable-based idiom instead β€” quoted variables as the search and
replace operands force LITERAL matching:
    BS='\'; DQ='"'
    REPO_PATH=${REPO_PATH//"$BS"/"$BS$BS"}
    REPO_PATH=${REPO_PATH//"$DQ"/"$BS$DQ"}
Verified with node JSON.parse round-trip for paths containing both " and \
(e.g. /c/Users/a"b\c -> {"path":"/c/Users/a\"b\c"} -> parses back exactly).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…5) + fsync (minor)

M5: force_reindex_with_stores step 4 still wrote metadata.json via a plain
std::fs::write(to_string_pretty(..)), bypassing the atomic writer. A crash
there could truncate metadata.json β€” the exact failure the atomic RMW was
introduced to prevent. Route it through crate::vectordb::merge_metadata_atomic,
overlaying the preserved keys + a fresh indexed_at onto any existing content.

Minor: atomic_write_json now fsyncs the temp file (File::create + write_all +
sync_all) BEFORE the rename, so a power-loss cannot leave a zero-length/garbage
file in place of the old metadata. Doc comment updated to match the actual
guarantee instead of overstating it.

vectordb store tests pass (incl. test_persistence).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…M3) + csharp_error (M4)

M3: the remote TUI's `i` and `d` keys called GET /repos/{alias}/info and
POST /repos/{alias}/doctor, but neither route existed server-side, so they
always failed with "endpoint not available" / HTTP error. Add:
- info_handler (GET /repos/:alias/info): mirrors tui::build_info_overlay,
  returns the exact InfoResponse shape (chunks/files/max_chunk_id/
  db_size_human/model/dims/lock/index_age). 404 on unknown alias.
- doctor_handler (POST /repos/:alias/doctor): runs diagnose_with_store when
  the repo's stores are open (reuses the LMDB handle β€” avoids double-open),
  else diagnose(); returns {"results": [...]} from DoctorReport::render_tui.
Both registered in the same auth-layered Router as reindex; like /status
they're open on localhost and key-protected on network binds.
format_age/dir_size_human in tui.rs made pub(crate) for reuse.

M4: status JSON now includes "csharp_error" so the remote detail panel shows
the real C# error instead of the literal "Unknown error".

cargo check + clippy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rt classify + tests

Jupyter (src/chunker/jupyter.rs):
- Normalize RawCell.line_count to >=1 in extract_cell so the two passes in
  merge_adjacent_cells (line numbering vs merge accumulation) agree; empty
  cells previously stored 0 which the passes counted differently.
- Add a loud module-header CAVEAT that chunk start/end lines are synthetic
  cell-relative positions, NOT real .ipynb JSON offsets β€” future jump-to-line
  / re-extraction features must not trust them. Note kernel language is not
  read (generic code labels).

Dart (src/chunker/extractor.rs):
- Remove the dead/misleading `mixin_application` branch from classify().
  Empirically verified: Dart mixin members parent to `class_body` (already
  Method); `mixin_application` is the `with A, B` clause and never parents a
  function_declaration, so the branch never fired.

Tests (src/chunker/semantic.rs):
- test_dart_semantic_chunking: top-level fn β†’ Function, class & mixin members
  β†’ Method (regression guard for the misclassification).
- test_dart_unparseable_still_chunks: malformed .dart still yields fallback
  chunks (grammar-failure resilience).

64 chunker tests pass; clippy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DRY (src/serve/tui_common.rs, tui.rs, tui_remote.rs):
- Extract shared format_uptime_secs(u64) β€” format_uptime delegates; remote's
  format_uptime_from_secs removed.
- Move byte-identical restore_terminal into tui_common (pub); both TUIs call it.
- render_centered_modal now delegates to the _with_border_color variant
  (Color::Cyan), removing ~45 duplicated lines.

Hygiene:
- Fix stale comments: tui.rs "'s' pressed" -> "'l' pressed"; tui_remote module
  doc "Actions (i/d/f/s)" -> accurate i=info/d=doctor/n=reindex/r=remove/l=reload.
- Remove dead `let _ = tx;` suppression in tui_remote.
- Align C# sentinel: tui.rs map_repo_rows emits "none" (was "") to match
  status_handler; shared consumer handles it via default branch (no display change).
- Reuse a single reqwest::Client across the remote TUI poll/actions instead of
  constructing one per request.
- search/mod.rs: warn! on query-side model parse-failure/override fallback
  (previously a silent unwrap_or_default) β€” keeps the default, just makes a
  model mismatch visible in logs.

Test (src/serve/mod.rs): info_doctor_routes_registered asserts GET /info and
POST /doctor on an unknown alias return 404 (mirrors reindex route test).

cargo check + clippy clean; 29 serve tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…regression)

The Stage 1 fail-fast resolved the embedding model at the top of
perform_incremental_refresh_with_stores, which made an up-to-date (no-op)
refresh error out on any index whose metadata model can't be parsed β€”
breaking test_incremental_refresh_up_to_date_is_noop and, more importantly,
failing no-op refreshes that never embed anything.

Move the strict resolve_embed_model() call inside the `if !changed_files`
block so it runs only when there are files to embed; keep a lenient
model_name/dimensions read at the top for the FileMetaStore. Embedding still
fails fast with the same guarantee; no-op refreshes no longer require a
resolvable model.

Full lib+bins suite green (482 passed; the unrelated Windows relocation test
is pre-existing flakiness β€” passes in isolation).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…auth notes

Holistic-review findings on the combined branch:

IMPORTANT β€” partial-deletion window: in perform_incremental_refresh_with_stores
the lazy resolve_embed_model ran AFTER stale-chunk deletions were committed, so
on a corrupt index (unknown model / dims mismatch) it could delete chunks then
error before re-embedding, leaving the index with data removed. Move the
fail-fast resolve to right after the no-op early-return β€” before any destructive
store mutation β€” so a corrupt index errors with the index still intact. embed_model
is still consumed in the changed-files block.

MINOR β€” doctor_handler ran synchronous diagnose() (tree walk + LMDB scan) while
holding a tokio read guard on an async worker. Wrap it in spawn_blocking using
blocking_read on the cloned Arc<RwLock> (mirrors the reindex path); still reuses
the open LMDB handle to avoid double-open.

MINOR β€” document at the route declaration why POST /doctor is intentionally
outside require_admin_auth's management set (read-only diagnostics).

cargo check + clippy clean; target tests pass (noop refresh, info_doctor routes).
The unrelated Windows relocation test remains pre-existing flakiness (passes in
isolation).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The db_discovery::repos relocation tests flaked under full-suite parallel load
(intermittent "Access is denied" on rename, and occasional missing git remote).
Root cause: these tests `git init` a directory then rename it; on Windows the
indexer/antivirus scans each fresh .git tree and holds handles on it, blocking
the rename. When many such tests run concurrently the scanner is overwhelmed
and handles linger for >7s β€” long enough to exhaust the old 10-attempt rename
retry. A transient msys fork failure (EAGAIN) spawning git could also silently
drop a repo's remote.

Fixes:
- Serialize the 9 git-spawning / renaming relocation tests behind a shared,
  poison-tolerant Mutex so only one .git tree is created/renamed at a time β€”
  keeping each Defender scan window short. This is the decisive fix.
- Harden production git_remote_url(): retry on transient spawn failure (fork
  exhaustion) instead of treating it as "no remote", which would wrongly strip
  a repo's git identity and break relocation / cause prune. NotFound (git
  absent) still returns immediately; an Ok-but-nonzero status is a real answer
  and is not retried.
- init_git_remote test helper: same spawn-retry instead of .expect() panic.
- rename_retry: raise budget to 40 attempts with a 250ms-capped backoff.

Validation: full lib suite run 6x consecutively under parallel load β€” 482
passed, 0 failed every time (incl. a 15s heavy-contention run that previously
failed). clippy -D warnings clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ions" error

The reported 500 ("an environment is already opened with different options"
opening BAYR.Aprimo) is heed 0.20.5's Error::BadOpenOptions from heed's own
process-global OPENED_ENV registry β€” NOT codesearch's TrackedEnv guard (which
would say "double-open prevented"). It fires when a path resolves to a
still-live heed env whose recorded map_size differs from the reopen's resolved
size (e.g. after an MDB_MAP_FULL resize).

Root cause: TrackedEnv::drop called unregister() in the drop body, but the
`inner: heed::Env` field is dropped only AFTER the body returns (Rust drop
order). That leaves a window where codesearch's LMDB_REGISTRY slot is free but
heed's env is still alive. A concurrent TrackedEnv::open (idle reaper dropping a
repo while a reindex/query reopens it) passes register() and falls through to
opts.open(), hitting heed's raw error.

Note: the previously applied stop_fsw fix (drop Readonly/Conflicted) does not
cover this incident β€” the repo was idle-evicted, so stop_fsw returns None before
reaching that branch.

Fix: wrap inner in ManuallyDrop and drop the heed::Env BEFORE unregister(),
enforcing "our slot free => heed's slot free". A concurrent open now either sees
our slot occupied (clear double-open message + retry) or both free (clean
reopen) β€” never the inconsistent state. Adds a multi-threaded regression guard
that asserts the forbidden heed string never surfaces (non-flaky: only fails on
real regression).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ons" 500

Strengthening the TrackedEnv concurrency test to 8 threads x 4000 iters
revealed the drop-order reorder alone does NOT close the window: with
DIFFERENT map sizes on the same path, heed still raised "an environment is
already opened with different options". Probing with a CONSISTENT size proved
the error is specific to size mismatch β€” heed defers env close, so a reopen can
briefly observe the prior live env, and only disagreeing options trigger the
error.

Root fix: a process-global per-path map_size pin (MAP_SIZE_PINS). The first
resolve_map_size() for a path fixes its size for the process lifetime
(monotonically non-decreasing, capped at MAX_LMDB_MAP_SIZE_MB); resize_environment
raises the pin so post-resize reopens (e.g. after idle eviction) match the
still-live env. This makes every open of a path use a consistent size
regardless of metadata-persistence state β€” the user's BAYR metadata.json had no
lmdb_map_size_mb, which is exactly how a resized env could be reopened with a
mismatched size. The Drop reorder remains as complementary hardening.

Tests:
- lmdb_registry: concurrent open/drop/reopen with consistent size (8x4000,
  barrier-synced) asserts the heed string never surfaces β€” non-flaky guard.
- store: pin is stable+monotonic per path (a later smaller persisted size does
  not shrink the live pin) and capped at MAX.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pressing 'n' (reindex) previously gave no instant confirmation β€” the status
column only flipped to "Indexing…" on the next 500ms redraw, which was easy to
miss on a fast reindex. Now a transient footer flash confirms the action
immediately and honestly reflects the launch outcome:

- Started        β†’ "⟳ Reindex started for '<alias>' …"
- AlreadyRunning β†’ "⟳ Reindex already running for '<alias>'"
- Failed         β†’ "βœ— Cannot reindex '<alias>' β€” see logs"

spawn_force_reindex now returns a ReindexLaunch enum so the flash maps to the
real synchronous launch result (guard hit / unresolved alias / read-only / open
error). The flash auto-clears after 4s. The existing pulsing "⟳ idx…" status
label is unchanged. Scope: local in-process TUI only (per request); render_footer
gains an Option<&str> flash arg, remote TUI passes None (behavior unchanged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
πŸ› fix: review corrections β€” LMDB reopen 500, indexing & TUI hardening
…& TUI reindex feedback

- README: add Dart row to Supported Languages table (the 16th tree-sitter
  grammar was already counted at line 18; only the table row was missing).
- CHANGELOG [Unreleased]: document the heed "different options" reopen-500 fix
  (TrackedEnv drop-order + per-path map_size pin) under Fixed, and the immediate
  TUI reindex feedback under Added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- CHANGELOG: move [Unreleased] items under [1.0.207] - 2026-06-12.
- Add the `serve --host` / non-localhost bind feature (#114) to the release
  notes (env CODESEARCH_SERVE_HOST, 0.0.0.0 for containers, pair with
  CODESEARCH_SERVE_API_KEY).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion directly

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
πŸ“ docs: add Dart to README + changelog for reopen-500 fix & TUI reindex feedback
@flupkede flupkede merged commit 42cd183 into master Jun 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant