Skip to content

fix(daemon): fall back to readonly when writer pidfile is held#311

Merged
EtanHey merged 1 commit into
mainfrom
fix/daemon-fallback-readonly
May 22, 2026
Merged

fix(daemon): fall back to readonly when writer pidfile is held#311
EtanHey merged 1 commit into
mainfrom
fix/daemon-fallback-readonly

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented May 22, 2026

Why

PR #309 (single-writer pidfile mutex) inadvertently broke brainlayer-daemon (FastAPI). At startup, daemon.py:83 opens VectorStore(DEFAULT_DB_PATH) in RW mode, but the new pidfile mutex raises WriterInUseError if any other writer (enrich supervisor, drain, etc.) holds the lock — which is normal production state.

Symptom: brainlayer search CLI hangs ~17-36 min, brainlayer stats hangs, /tmp/brainlayer.sock never appears.

Discovered during Phase 4a eval baseline.json generation tonight — the runner needed brainlayer search to work.

Fix

Catch WriterInUseError in the lifespan handler and fall back to readonly mode. Log a clear warning explaining what works and what doesn't.

Smoke-test evidence (empirical)

Direct lifespan invocation while enrich supervisor PID 14909 holds the pidfile:

$ PYTHONPATH=src python -c "import asyncio; from brainlayer.daemon import lifespan, app; asyncio.run((lambda: (yield from lifespan(app).__aenter__()))())"
Another writer holds the pidfile mutex (another writer is using /Users/etanheyman/.local/share/brainlayer/brainlayer.db (pid 14909)); falling back to READONLY mode. Search/stats/context endpoints work; write endpoints (/digest, /store, /entity PATCH, /backlog/items POST/PATCH/DELETE) will return 500 until daemon is restarted with no live writer.
LIFESPAN OK

Test plan

  • Smoke-tested locally against production state (enrich supervisor alive)
  • Brainlayer search CLI works after this PR (would need merge + restart to verify)
  • Dashboard search endpoints work in readonly mode
  • Future PR: move write endpoints (/digest, /store, /entity PATCH, /backlog/items) to arbitrated queue path so daemon can serve them regardless of writer state

Sequencing

Independent of Phase 4a/4b. Ships standalone as a small bugfix.

🤖 Generated with Claude Code by orcClaude-successor s:42

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com


Note

Medium Risk
Changes daemon startup behavior to continue running in a degraded (readonly) mode when a writer lock is held; this can surface as 500s on write endpoints until restart. Risk is limited to initialization/control flow and logging, with no schema or query logic changes.

Overview
Fixes brainlayer-daemon startup failures introduced by the single-writer pidfile mutex by catching WriterInUseError during VectorStore initialization.

On lock contention, the daemon now falls back to VectorStore(..., readonly=True) and logs a clear warning that read endpoints (search/stats/context/dashboard) remain available while write endpoints (e.g. POST /digest, POST /store, PATCH /entity, backlog mutations) will fail until restarted without another active writer.

Reviewed by Cursor Bugbot for commit 6383adf. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Fall back to readonly VectorStore when writer pidfile is held on daemon startup

In daemon.py, the lifespan startup now wraps VectorStore initialization in a try/except. If WriterInUseError is raised, the daemon logs a warning and re-opens the store with readonly=True instead of failing. Behavioral Change: the daemon can now start successfully when another process holds the writer lock, but will serve read-only traffic in that state.

Macroscope summarized 6383adf.

Summary by CodeRabbit

  • New Features
    • Daemon startup is now more resilient. When another writer instance is active, the daemon gracefully falls back to read-only mode, allowing read operations to continue while write operations remain unavailable until the conflict is resolved.

Review Change Stack

PR #309 added a pidfile mutex on VectorStore RW init to gate multiple
enrich processes. Unintended consequence: the FastAPI daemon's
lifespan also opens VectorStore RW at startup (daemon.py:83) — if
the enrich supervisor is alive (which it normally is), the daemon
fails to start entirely with WriterInUseError. That breaks:

  - brainlayer search CLI (DaemonClient hangs trying to start daemon)
  - brainlayer stats CLI (same)
  - any dashboard frontend on localhost:3000 etc.

Empirically observed: enrich supervisor PID 14909 holds the pidfile,
brainlayer-daemon refuses to bootstrap, /tmp/brainlayer.sock never
appears, all CLI operations time out at the DAEMON_STARTUP_TIMEOUT.

Fix: catch WriterInUseError on the lifespan VectorStore() call and
fall back to readonly mode. In readonly mode:
  - Search endpoints work (the common case)
  - Stats/context/dashboard endpoints work
  - Write endpoints (/digest, /store, /entity PATCH, /backlog/items
    POST/PATCH/DELETE) will return 500 on the SQL layer

Operator can restart the daemon (with no live writer) to regain RW
mode for write endpoints. Or eventually move write endpoints to the
arbitrated queue path (post-Phase-4 work).

Empirically verified by smoke-test:
  - Direct lifespan invocation against current production state
    (enrich supervisor PID 14909 holds pidfile)
  - Pre-fix: WriterInUseError raised, lifespan fails to enter, daemon exits
  - Post-fix: warning logged, lifespan enters readonly, "LIFESPAN OK"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

📝 Walkthrough

Walkthrough

The daemon startup logic now attempts to initialize the vector store in read-write mode. If another live writer holds the mutex lock, raising WriterInUseError, the daemon logs a warning and falls back to readonly mode, allowing read endpoints to function while write endpoints fail at the SQL layer until restart.

Changes

Vector Store Readonly Fallback

Layer / File(s) Summary
Vector store readonly fallback on writer conflict
src/brainlayer/daemon.py
WriterInUseError is imported and FastAPI lifespan startup now uses try/except to attempt read-write initialization; on error, falls back to readonly mode with warning log.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

A rabbit hops through write-locked gates,
Takes a breath and gently waits,
Pivots softly to read-only mode,
Lets the journey still unfold,
When conflicts come, just read the load! 🐰📖

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: the daemon now falls back to readonly mode when a writer pidfile is held, directly matching the core fix described in the PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/daemon-fallback-readonly

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/brainlayer/daemon.py`:
- Around line 92-97: The warning message emitted in the pidfile mutex fallback
(the logger.warning call that describes entering READONLY mode) incorrectly
lists /backlog/items POST/PATCH/DELETE as blocked; remove those backlog
endpoints from the degraded-mode message so it only advertises write endpoints
that actually touch vector_store (for example keep /digest, /store, /entity
PATCH, etc.), and if you intend to gate backlog mutations separately, add a
distinct check/gate for them (see the code around the pidfile mutex/READONLY
logic in daemon.py where logger.warning is called).
- Around line 89-99: The startup path currently constructs VectorStore using
DEFAULT_DB_PATH directly; instead, call get_db_path() once from paths.py and
store the result (e.g., db_path = get_db_path()) and use that db_path for both
VectorStore(...) calls (the initial read-write attempt and the readonly fallback
in the WriterInUseError handler) so the daemon honors repository/CLI overrides;
update references to DEFAULT_DB_PATH to use the db_path variable and keep
existing exception handling around WriterInUseError and logging unchanged.
- Around line 88-99: When VectorStore is opened in readonly fallback (catching
WriterInUseError in the VectorStore(DEFAULT_DB_PATH) block), set a
module-level/deamon-level flag (e.g., readonly_degraded = True) alongside
assigning vector_store so the daemon state explicitly records the degraded mode;
then update the write-route handlers api_digest, api_store, and
api_update_entity to check that flag at the top of each handler and immediately
return an HTTP 503 response (with a clear message) before any write-heavy
processing or DB calls if readonly_degraded is set. Ensure the flag name and
checks are consistent with the VectorStore/WriterInUseError code paths so the
handlers short-circuit reliably whenever the store was opened with
readonly=True.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 83faec47-4603-46a0-a1be-1d4334207333

📥 Commits

Reviewing files that changed from the base of the PR and between cce646a and 6383adf.

📒 Files selected for processing (1)
  • src/brainlayer/daemon.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: Macroscope - Correctness Check
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.12)
  • GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

  • src/brainlayer/daemon.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

  • src/brainlayer/daemon.py

Comment thread src/brainlayer/daemon.py
Comment on lines +88 to +99
try:
vector_store = VectorStore(DEFAULT_DB_PATH)
logger.info(f"Loaded vector store (READ-WRITE): {vector_store.count()} chunks")
except WriterInUseError as exc:
logger.warning(
"Another writer holds the pidfile mutex (%s); falling back to READONLY mode. "
"Search/stats/context endpoints work; write endpoints (/digest, /store, /entity PATCH, "
"/backlog/items POST/PATCH/DELETE) will return 500 until daemon is restarted with no live writer.",
exc,
)
vector_store = VectorStore(DEFAULT_DB_PATH, readonly=True)
logger.info(f"Loaded vector store (READONLY fallback): {vector_store.count()} chunks")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject local write routes explicitly after readonly fallback.

This only changes how VectorStore is opened. api_digest, api_store, and api_update_entity still enter their normal write-heavy code paths and will fail only after they reach SQLite, so the daemon is still doing concurrent write work while another writer owns the mutex. Persist a readonly/degraded flag here and short-circuit those handlers with a 503 before any write-side processing starts. As per coding guidelines, "Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior" and "Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/brainlayer/daemon.py` around lines 88 - 99, When VectorStore is opened in
readonly fallback (catching WriterInUseError in the VectorStore(DEFAULT_DB_PATH)
block), set a module-level/deamon-level flag (e.g., readonly_degraded = True)
alongside assigning vector_store so the daemon state explicitly records the
degraded mode; then update the write-route handlers api_digest, api_store, and
api_update_entity to check that flag at the top of each handler and immediately
return an HTTP 503 response (with a clear message) before any write-heavy
processing or DB calls if readonly_degraded is set. Ensure the flag name and
checks are consistent with the VectorStore/WriterInUseError code paths so the
handlers short-circuit reliably whenever the store was opened with
readonly=True.

Comment thread src/brainlayer/daemon.py
Comment on lines +89 to +99
vector_store = VectorStore(DEFAULT_DB_PATH)
logger.info(f"Loaded vector store (READ-WRITE): {vector_store.count()} chunks")
except WriterInUseError as exc:
logger.warning(
"Another writer holds the pidfile mutex (%s); falling back to READONLY mode. "
"Search/stats/context endpoints work; write endpoints (/digest, /store, /entity PATCH, "
"/backlog/items POST/PATCH/DELETE) will return 500 until daemon is restarted with no live writer.",
exc,
)
vector_store = VectorStore(DEFAULT_DB_PATH, readonly=True)
logger.info(f"Loaded vector store (READONLY fallback): {vector_store.count()} chunks")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Resolve the DB path with get_db_path() in this startup path.

The new open path still uses DEFAULT_DB_PATH directly for both the RW and readonly attempts, which bypasses the repo's required resolver and can desync the daemon from callers that honor DB path overrides. Please fetch the path once via paths.py:get_db_path() and use that for both initializations. As per coding guidelines, "Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/brainlayer/daemon.py` around lines 89 - 99, The startup path currently
constructs VectorStore using DEFAULT_DB_PATH directly; instead, call
get_db_path() once from paths.py and store the result (e.g., db_path =
get_db_path()) and use that db_path for both VectorStore(...) calls (the initial
read-write attempt and the readonly fallback in the WriterInUseError handler) so
the daemon honors repository/CLI overrides; update references to DEFAULT_DB_PATH
to use the db_path variable and keep existing exception handling around
WriterInUseError and logging unchanged.

Comment thread src/brainlayer/daemon.py
Comment on lines +92 to +97
logger.warning(
"Another writer holds the pidfile mutex (%s); falling back to READONLY mode. "
"Search/stats/context endpoints work; write endpoints (/digest, /store, /entity PATCH, "
"/backlog/items POST/PATCH/DELETE) will return 500 until daemon is restarted with no live writer.",
exc,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't advertise backlog mutations as blocked by readonly mode.

/backlog/items POST/PATCH/DELETE do not touch vector_store; they go through _supabase_mutate, so this warning tells operators those endpoints are unavailable even though they should still work. Please remove them from the degraded-mode message unless you also add a separate gate for them.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/brainlayer/daemon.py` around lines 92 - 97, The warning message emitted
in the pidfile mutex fallback (the logger.warning call that describes entering
READONLY mode) incorrectly lists /backlog/items POST/PATCH/DELETE as blocked;
remove those backlog endpoints from the degraded-mode message so it only
advertises write endpoints that actually touch vector_store (for example keep
/digest, /store, /entity PATCH, etc.), and if you intend to gate backlog
mutations separately, add a distinct check/gate for them (see the code around
the pidfile mutex/READONLY logic in daemon.py where logger.warning is called).

@EtanHey EtanHey merged commit e396dea into main May 22, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant