Skip to content

feat: R81b meta-noise filter + enrichment prompt additions#225

Merged
EtanHey merged 3 commits into
mainfrom
feat/r81b-meta-noise-filter
Apr 8, 2026
Merged

feat: R81b meta-noise filter + enrichment prompt additions#225
EtanHey merged 3 commits into
mainfrom
feat/r81b-meta-noise-filter

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Apr 8, 2026

Summary

  • search-time filter using META_NOISE_PATTERNS to exclude literal MCP/meta-noise chunk content from hybrid_search by default, with explicit opt-out support
  • enrichment meta-research detection for literal brain_search(...) / brain_entity(...) tool invocations, setting low importance and adding the meta-research tag
  • conversational chunk guidance so short/informal chunks extract actionable items and commitments instead of discussion flow

Tests

  • 8 hybrid_search + 38 enrichment = 46 pass

Note

Add meta-noise filter to hybrid_search and extend enrichment prompts with meta-research detection

  • Adds filter_meta_noise: bool = True to SearchMixin.hybrid_search in search_repo.py, excluding chunks matching tool-transcript and QA-table patterns via SQL predicates and a post-filter; callers can opt out with filter_meta_noise=False.
  • Extends ENRICHMENT_PROMPT in enrichment.py with rubrics for meta-research detection, short/conversational chunks, epistemic_level, debt_impact, and sentiment_label.
  • Entity persistence in enrichment_controller.py now upserts kg_entities (refreshing updated_at on conflict) and inserts entity-to-chunk links into kg_entity_chunks, ignoring duplicates; failures are logged and do not interrupt enrichment.
  • Adds one-time per-process stderr signature emission in both build_prompt/build_external_prompt and _emit_enrichment_start (realtime mode); write failures are swallowed and logged at debug.
  • Behavioral Change: hybrid_search excludes meta-noise chunks by default, which may reduce result counts for callers that previously received tool-transcript content.

Macroscope summarized bcd32e0.

Summary by CodeRabbit

  • New Features

    • Search now filters out "meta-noise" by default to improve relevance, with an option to disable this behavior.
    • Enrichment prompts updated with clearer meta-research detection, guidance for short conversational chunks, and expanded evaluation rubrics (epistemic, debt, sentiment).
  • Bug Fixes

    • Enrichment startup marker emission made robust so telemetry writes won't interrupt realtime processing.
  • Tests

    • New and expanded tests cover meta-noise filtering, prompt content, concurrency, and error-handling.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

Adds process-level stderr diagnostic markers for enrichment runtime and prompt loading, strengthens prompt instructions for meta-research and chunking, implements optional meta-noise filtering in hybrid search, and adds tests covering prompt content, concurrency for prompt signature emission, and meta-noise filtering behavior.

Changes

Cohort / File(s) Summary
Enrichment Diagnostics
src/brainlayer/enrichment_controller.py
Emit a realtime-only ENRICHMENT_RUNTIME_LOADED marker to stderr at enrichment start (swallows OSError and logs debug); KG entity upsert conflict target changed to ON CONFLICT(id) while preserving updated_at update.
Enrichment Prompt & Emission
src/brainlayer/pipeline/enrichment.py
Add process-wide, thread-safe _emit_prompt_signature_once() that writes ENRICHMENT_PROMPT_LOADED to stderr once per process (swallows OSError and logs debug). Expand ENRICHMENT_PROMPT with meta-research detection, tool-invocation handling, short conversational chunk guidance, and expanded epistemic/debt/sentiment rubrics; call emitter from prompt builders.
Search Meta-Noise Filtering
src/brainlayer/search_repo.py
Add filter_meta_noise: bool = True parameter to hybrid_search() and incorporate it into cache key. Define META_NOISE_PATTERNS and casefolded variants; add FTS5 NOT LIKE constraints when enabled and a post-retrieval _contains_meta_noise() guard to skip matching candidates.
Tests — Enrichment Prompt & Concurrency
tests/test_enrichment_v2.py, tests/test_enrichment_controller.py, tests/test_enrichment_entity_schema.py
Add concurrency tests asserting _emit_prompt_signature_once() emits exactly once across threads and swallows OSError while logging; add test ensuring _emit_enrichment_start() swallows os.write errors and still emits the Axiom start event; expand prompt content assertions for meta-research, chunking, and rubric retention.
Tests — Hybrid Search
tests/test_hybrid_search.py
Add integration tests that seed meta-noise and real chunks, asserting default hybrid_search() filters meta-noise case-insensitively and that disabling filter_meta_noise returns meta-noise hits.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hop and nudge the stdout sky,
A tiny marker scrawled nearby,
Prompts grow wiser, noise takes flight,
Threads whisper once — then sleep at night.
Hooray — the search is tidy and spry!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: addition of a meta-noise filter and enrichment prompt enhancements, both of which are central to this PR.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/r81b-meta-noise-filter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/brainlayer/enrichment_controller.py`:
- Around line 463-467: The direct os.write() of the "ENRICHMENT_RUNTIME_LOADED"
marker in enrich_realtime() must be made best-effort: wrap the os.write(...)
call in a try/except OSError block and catch/log the exception at debug level
(include the exception message and that the stderr write failed) so an OSError
on FD 2 won’t abort enrich_realtime(); apply the same guard pattern around the
analogous os.write(...) call in the pipeline enrichment module (the call
referenced at enrichment.py:409) so both emitters are fault-tolerant.

In `@src/brainlayer/pipeline/enrichment.py`:
- Around line 403-412: Protect the unsynchronized check/set in
_emit_prompt_signature_once by introducing a module-level threading.Lock
(similar to existing _groq_rate_lock), acquire it before checking
_prompt_signature_emitted, perform a double-check inside the lock, set
_prompt_signature_emitted=True and call os.write() while still holding the lock,
and add try/except around os.write() to catch and log/ignore OSError so the call
won’t crash; update the same pattern at the other emission sites referenced by
the comment (the calls around lines where build_prompt/build_external_prompt are
invoked) so all prompt-signature emissions use the new locked, exception-safe
logic.

In `@src/brainlayer/search_repo.py`:
- Around line 107-110: The post-filter _contains_meta_noise currently does a
case-sensitive substring check against META_NOISE_PATTERNS so variants like
"Brain_Search" slip through; update _contains_meta_noise to normalize comparison
(e.g., use content.casefold() and compare against a pre-normalized, casefolded
META_NOISE_PATTERNS or use re.search with re.IGNORECASE) and also make the
SQL-level filter case-insensitive by switching pattern matching to ILIKE or
wrapping fields with LOWER(...) and comparing to lowercased patterns; ensure any
other places that reference META_NOISE_PATTERNS for SQL construction or
filtering (the other post-filter/code paths noted around the same sections) are
updated to use the same case-insensitive approach so both SQL and Python
post-filtering behave consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a840f78c-ac92-4913-9ad4-c59c860ea1a3

📥 Commits

Reviewing files that changed from the base of the PR and between 01f9cb3 and 80f1488.

📒 Files selected for processing (5)
  • src/brainlayer/enrichment_controller.py
  • src/brainlayer/pipeline/enrichment.py
  • src/brainlayer/search_repo.py
  • tests/test_enrichment_entity_schema.py
  • tests/test_hybrid_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

  • src/brainlayer/enrichment_controller.py
  • tests/test_hybrid_search.py
  • src/brainlayer/search_repo.py
  • tests/test_enrichment_entity_schema.py
  • src/brainlayer/pipeline/enrichment.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

  • src/brainlayer/enrichment_controller.py
  • src/brainlayer/search_repo.py
  • src/brainlayer/pipeline/enrichment.py
🧠 Learnings (8)
📚 Learning: 2026-04-02T23:32:14.543Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T23:32:14.543Z
Learning: Applies to src/brainlayer/*enrichment*.py : Enrichment rate configurable via `BRAINLAYER_ENRICH_RATE` environment variable (default 0.2 = 12 RPM)

Applied to files:

  • src/brainlayer/enrichment_controller.py
  • src/brainlayer/pipeline/enrichment.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Configure enrichment rate via `BRAINLAYER_ENRICH_RATE` environment variable (default 0.2 = 12 RPM)

Applied to files:

  • src/brainlayer/enrichment_controller.py
📚 Learning: 2026-03-22T15:55:22.017Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 100
File: src/brainlayer/enrichment_controller.py:175-199
Timestamp: 2026-03-22T15:55:22.017Z
Learning: In `src/brainlayer/enrichment_controller.py`, the `parallel` parameter in `enrich_local()` is intentionally kept in the function signature (currently unused, suppressed with `# noqa: ARG001`) for API stability. Parallel local enrichment via a thread pool or process pool is planned for a future iteration. Do not flag this as dead code requiring removal.

Applied to files:

  • src/brainlayer/enrichment_controller.py
  • src/brainlayer/pipeline/enrichment.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Implement chunk lifecycle columns: `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search; allow `include_archived=True` to show history

Applied to files:

  • src/brainlayer/search_repo.py
📚 Learning: 2026-04-04T23:24:03.159Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T23:24:03.159Z
Learning: Applies to src/brainlayer/{vector_store,search}*.py : Chunk lifecycle: implement columns `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search

Applied to files:

  • src/brainlayer/search_repo.py
📚 Learning: 2026-04-04T15:22:02.740Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 198
File: hooks/brainlayer-prompt-search.py:241-259
Timestamp: 2026-04-04T15:22:02.740Z
Learning: In `hooks/brainlayer-prompt-search.py` (Python), `record_injection_event()` is explicitly best-effort telemetry: silent `except sqlite3.Error: pass` is intentional — table non-existence or lock failures are acceptable silent failures. `sqlite3.connect(timeout=2)` is the file-open timeout; `PRAGMA busy_timeout` governs per-statement lock-wait. The `DEADLINE_MS` (450ms) guard applies only to the FTS search phase, not to this side-channel write.

Applied to files:

  • src/brainlayer/search_repo.py
📚 Learning: 2026-04-04T15:21:39.570Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 198
File: hooks/brainlayer-prompt-search.py:169-169
Timestamp: 2026-04-04T15:21:39.570Z
Learning: In EtanHey/brainlayer, `hooks/brainlayer-prompt-search.py` reads `entity_type` directly from existing rows in `kg_entities` (read-only). `contracts/entity-types.yaml` defines the write-side schema only and is not authoritative for what `entity_type` values exist in the DB. The DB already stores `technology` (72 entities), `project` (24), and `tool` (1) as valid `entity_type` values, so `INJECT_TYPES` in the hook should match these DB values, not the contract file.

Applied to files:

  • tests/test_enrichment_entity_schema.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Pipeline architecture: Extract → Classify → Chunk → Embed → Index, with post-processing for enrichment, brain graph, and Obsidian export

Applied to files:

  • src/brainlayer/pipeline/enrichment.py
🔇 Additional comments (4)
src/brainlayer/search_repo.py (1)

683-684: Nice cache-key isolation for filter_meta_noise.

Including filter_meta_noise in the cache key avoids filtered/unfiltered cross-contamination from _hybrid_cache.

Also applies to: 717-717

tests/test_hybrid_search.py (1)

230-263: Strong coverage for default-vs-opt-out meta-noise behavior.

This test captures both expected defaults and explicit override behavior in one place.

src/brainlayer/pipeline/enrichment.py (1)

304-308: Prompt additions are aligned and actionable.

The meta-research detection and short conversational chunk guidance are clear, and rubric wording remains explicit.

Also applies to: 315-316, 396-399

tests/test_enrichment_entity_schema.py (1)

100-119: Good prompt-contract regression checks.

These assertions lock in the newly required meta-research and rubric directives with clear intent.

Comment thread src/brainlayer/enrichment_controller.py Outdated
Comment on lines +463 to +467
if mode == "realtime":
os.write(
2,
b"ENRICHMENT_RUNTIME_LOADED mode=realtime prompt=r81 truncation=8000 split=4800/3200 rubrics=epistemic_level,debt_impact,sentiment_label\n",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify unguarded stderr marker writes in enrichment paths.
rg -n -C3 'def _emit_enrichment_start|os\.write\(' src/brainlayer/enrichment_controller.py src/brainlayer/pipeline/enrichment.py

Repository: EtanHey/brainlayer

Length of output: 1299


🏁 Script executed:

#!/bin/bash
# Get full context of _emit_enrichment_start and its callers
# First, check the function definition and return behavior
sed -n '462,480p' src/brainlayer/enrichment_controller.py

# Then find where _emit_enrichment_start is called
echo "=== Callers of _emit_enrichment_start ==="
rg -n '_emit_enrichment_start' src/brainlayer/enrichment_controller.py

# Check if there are other unguarded os.write calls in the codebase
echo "=== All os.write calls in brainlayer ==="
rg -n 'os\.write\(' src/brainlayer/ --type py

Repository: EtanHey/brainlayer

Length of output: 1095


🏁 Script executed:

#!/bin/bash
# Check enrich_realtime to see what happens if _emit_enrichment_start fails
# Search for the function definition and how it handles _emit_enrichment_start
rg -B5 -A20 'def enrich_realtime' src/brainlayer/enrichment_controller.py

Repository: EtanHey/brainlayer

Length of output: 894


🏁 Script executed:

#!/bin/bash
# Check if os.write can actually raise OSError and under what conditions
# Also verify if any existing guard patterns are used in the codebase
rg -B2 -A2 'try:' src/brainlayer/enrichment_controller.py | head -50

Repository: EtanHey/brainlayer

Length of output: 1582


🏁 Script executed:

#!/bin/bash
# Check exact current state of lines 462-470
sed -n '462,470p' src/brainlayer/enrichment_controller.py

Repository: EtanHey/brainlayer

Length of output: 418


🏁 Script executed:

#!/bin/bash
# Check if logger is already available in enrichment_controller.py
head -30 src/brainlayer/enrichment_controller.py | rg -n 'import|logger'

Repository: EtanHey/brainlayer

Length of output: 516


🏁 Script executed:

#!/bin/bash
# Check if there's any error handling pattern around _emit_enrichment_start calls
sed -n '515,530p' src/brainlayer/enrichment_controller.py

Repository: EtanHey/brainlayer

Length of output: 705


Guard realtime stderr marker emission to prevent enrichment failure.

Line 464 writes directly to FD 2 without error handling. An OSError from a broken/unavailable stderr—common in containerized or CI environments—will interrupt the entire enrich_realtime() call before any chunks are processed. This diagnostic marker should be best-effort.

Wrap the os.write() call in a try-except block that logs the failure at debug level:

Proposed fix
 def _emit_enrichment_start(mode: str, limit: int) -> bool:
     if mode == "realtime":
-        os.write(
-            2,
-            b"ENRICHMENT_RUNTIME_LOADED mode=realtime prompt=r81 truncation=8000 split=4800/3200 rubrics=epistemic_level,debt_impact,sentiment_label\n",
-        )
+        try:
+            os.write(
+                2,
+                b"ENRICHMENT_RUNTIME_LOADED mode=realtime prompt=r81 truncation=8000 split=4800/3200 rubrics=epistemic_level,debt_impact,sentiment_label\n",
+            )
+        except OSError:
+            logger.debug("Failed to emit ENRICHMENT_RUNTIME_LOADED marker", exc_info=True)
     return _emit_enrichment_event(

Note: A similar unguarded os.write() call exists at src/brainlayer/pipeline/enrichment.py:409. Apply the same guard pattern there.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if mode == "realtime":
os.write(
2,
b"ENRICHMENT_RUNTIME_LOADED mode=realtime prompt=r81 truncation=8000 split=4800/3200 rubrics=epistemic_level,debt_impact,sentiment_label\n",
)
if mode == "realtime":
try:
os.write(
2,
b"ENRICHMENT_RUNTIME_LOADED mode=realtime prompt=r81 truncation=8000 split=4800/3200 rubrics=epistemic_level,debt_impact,sentiment_label\n",
)
except OSError:
logger.debug("Failed to emit ENRICHMENT_RUNTIME_LOADED marker", exc_info=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/enrichment_controller.py` around lines 463 - 467, The direct
os.write() of the "ENRICHMENT_RUNTIME_LOADED" marker in enrich_realtime() must
be made best-effort: wrap the os.write(...) call in a try/except OSError block
and catch/log the exception at debug level (include the exception message and
that the stderr write failed) so an OSError on FD 2 won’t abort
enrich_realtime(); apply the same guard pattern around the analogous
os.write(...) call in the pipeline enrichment module (the call referenced at
enrichment.py:409) so both emitters are fault-tolerant.

Comment thread src/brainlayer/pipeline/enrichment.py Outdated
Comment on lines +403 to +412
def _emit_prompt_signature_once() -> None:
"""Write a single prompt signature line per process for daemon verification."""
global _prompt_signature_emitted
if _prompt_signature_emitted:
return
_prompt_signature_emitted = True
os.write(
2,
b"ENRICHMENT_PROMPT_LOADED truncation=8000 split=4800/3200 rubrics=epistemic_level,debt_impact,sentiment_label\n",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify call sites and parallel paths for prompt building and signature emission.
rg -n -C3 '_emit_prompt_signature_once|build_external_prompt\(|build_prompt\(|ThreadPoolExecutor|executor\.submit\(' src/brainlayer/pipeline/enrichment.py src/brainlayer/enrichment_controller.py

Repository: EtanHey/brainlayer

Length of output: 10848


🏁 Script executed:

#!/bin/bash
# Check for any existing synchronization around _prompt_signature_emitted
# and verify the actual execution context

# 1. Check the full context of the global variable and function
sed -n '40,50p' src/brainlayer/pipeline/enrichment.py
sed -n '403,420p' src/brainlayer/pipeline/enrichment.py

# 2. Check if there are any threading locks defined in the module
rg -n "threading\.(Lock|RLock|Semaphore|Condition)" src/brainlayer/pipeline/enrichment.py

# 3. Verify the execution flow in _enrich_one to see if it calls build_prompt
rg -n -A20 "^def _enrich_one" src/brainlayer/pipeline/enrichment.py | head -40

Repository: EtanHey/brainlayer

Length of output: 2005


🏁 Script executed:

#!/bin/bash
# Check enrichment_controller to confirm ThreadPoolExecutor calls paths with _emit_prompt_signature_once
sed -n '540,560p' src/brainlayer/enrichment_controller.py
sed -n '340,360p' src/brainlayer/enrichment_controller.py

Repository: EtanHey/brainlayer

Length of output: 1528


Add lock to protect prompt-signature emission from race conditions.

_prompt_signature_emitted uses unsynchronized check-then-set at lines 405–407, and os.write() at line 409 has no error handling. Multiple threads call build_prompt() and build_external_prompt() via ThreadPoolExecutor in both enrichment.py and enrichment_controller.py, creating a race condition that can emit duplicate markers or fail silently.

The module already uses threading.Lock() for _groq_rate_lock (line 99); apply the same pattern here with a double-check inside the lock and exception handling for os.write().

Proposed fix
 logger = logging.getLogger(__name__)
 _prompt_signature_emitted = False
+_prompt_signature_lock = threading.Lock()
@@
 def _emit_prompt_signature_once() -> None:
     """Write a single prompt signature line per process for daemon verification."""
     global _prompt_signature_emitted
     if _prompt_signature_emitted:
         return
-    _prompt_signature_emitted = True
-    os.write(
-        2,
-        b"ENRICHMENT_PROMPT_LOADED truncation=8000 split=4800/3200 rubrics=epistemic_level,debt_impact,sentiment_label\n",
-    )
+    with _prompt_signature_lock:
+        if _prompt_signature_emitted:
+            return
+        try:
+            os.write(
+                2,
+                b"ENRICHMENT_PROMPT_LOADED truncation=8000 split=4800/3200 rubrics=epistemic_level,debt_impact,sentiment_label\n",
+            )
+        except OSError:
+            logger.debug("Failed to emit ENRICHMENT_PROMPT_LOADED marker", exc_info=True)
+        finally:
+            _prompt_signature_emitted = True

Also applies to call sites: enrichment.py:444, 515

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/pipeline/enrichment.py` around lines 403 - 412, Protect the
unsynchronized check/set in _emit_prompt_signature_once by introducing a
module-level threading.Lock (similar to existing _groq_rate_lock), acquire it
before checking _prompt_signature_emitted, perform a double-check inside the
lock, set _prompt_signature_emitted=True and call os.write() while still holding
the lock, and add try/except around os.write() to catch and log/ignore OSError
so the call won’t crash; update the same pattern at the other emission sites
referenced by the comment (the calls around lines where
build_prompt/build_external_prompt are invoked) so all prompt-signature
emissions use the new locked, exception-safe logic.

Comment thread src/brainlayer/search_repo.py Outdated
- ON CONFLICT(id): The id is uuid5(etype, name.lower()) — deterministic. Conflict should be on id, not (entity_type, name), so retries idempotently UPDATE the same row instead of failing with PRIMARY KEY violation. Fixes UNIQUE constraint errors during enrichment runs.
- ruff format: test_enrichment_v2.py line wrapping per --check requirements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@EtanHey EtanHey merged commit c367486 into main Apr 8, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant