feat: mid-run backend fallback for enrichment#38
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis pull request introduces a mid-run fallback mechanism for LLM calls in the enrichment pipeline. It tracks consecutive failures on a primary backend, switches to a fallback backend after reaching a threshold, and retries failed chunks on the fallback. Global state variables manage failure counts, activation status, and availability checks. Comprehensive tests validate the fallback behavior across success, failure, and transition scenarios. Changes
Sequence DiagramsequenceDiagram
actor Client
participant enrichment as Enrichment Pipeline
participant primary as Primary Backend
participant fallback as Fallback Backend
participant checker as Availability Checker
Client->>enrichment: call_llm(request)
enrichment->>primary: attempt LLM call
alt Success
primary-->>enrichment: response
enrichment->>enrichment: reset _consecutive_failures
enrichment-->>Client: return response
else Failure & Threshold Not Reached
primary-->>enrichment: error
enrichment->>enrichment: increment _consecutive_failures
enrichment-->>Client: propagate error
else Failure & Threshold Reached
primary-->>enrichment: error
enrichment->>enrichment: increment _consecutive_failures
enrichment->>checker: _check_fallback_available()
alt Fallback Available
checker-->>enrichment: true (cached)
enrichment->>enrichment: _fallback_active = true
enrichment->>enrichment: log fallback activation
enrichment->>fallback: retry with fallback backend
fallback-->>enrichment: response/error
enrichment-->>Client: return result
else Fallback Unavailable
checker-->>enrichment: false (cached)
enrichment->>enrichment: log no-fallback warning
enrichment-->>Client: propagate error
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| f" FALLBACK: {_consecutive_failures} consecutive failures on {effective}, switching to {fallback}", | ||
| file=sys.stderr, | ||
| ) | ||
| _fallback_active = True |
There was a problem hiding this comment.
Race condition on global fallback state with parallel workers
Medium Severity
The module-level _consecutive_failures, _fallback_active, and _fallback_available are read and written from call_llm without synchronization, but enrich_batch calls _enrich_one (which calls call_llm) from multiple threads via ThreadPoolExecutor when parallel > 1 (recommended value is 3 for MLX). The _consecutive_failures += 1 operation is not atomic — even under the GIL, it decomposes into LOAD/ADD/STORE bytecodes that can interleave. This causes lost increments, delaying or preventing fallback activation in exactly the crash scenario this feature targets.
Additional Locations (1)
| _fallback_available = True | ||
| except Exception: | ||
| _fallback_available = False | ||
| return _fallback_available |
There was a problem hiding this comment.
Fallback availability cache ignores primary backend parameter
Low Severity
_check_fallback_available caches its result in a single _fallback_available boolean, but which backend it checks depends on the primary parameter. If first called with primary="mlx" (checks ollama), the cached True is returned for a subsequent call with primary="ollama" without ever checking if mlx is available. The cache key doesn't account for the input, making the cached result potentially wrong for a different primary.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/brainlayer/pipeline/enrichment.py`:
- Around line 431-457: The fallback shared state (_consecutive_failures,
_fallback_active, _fallback_available) is accessed concurrently and must be
protected with a lock: add a module-level threading.Lock (or RLock) and wrap all
read/modify/write accesses in call_llm(), _check_fallback_available(), and the
reset logic in run_enrichment() with that lock (acquire before reading the
cached _fallback_available, before incrementing or checking
_consecutive_failures and before setting _fallback_active, and before resetting
these variables) to ensure atomicity and prevent race conditions.
In `@tests/test_enrichment_fallback.py`:
- Around line 93-96: Replace the bare try/except in the test with an explicit
pytest.raises assertion: call enrichment.run_enrichment(max_chunks=0,
batch_size=1) inside a pytest.raises(RuntimeError) context so the test asserts
the specific RuntimeError from the backend check rather than swallowing any
Exception; keep the same parameters and test setup around the call to ensure
behavior is unchanged.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
src/brainlayer/pipeline/enrichment.pytests/test_enrichment_fallback.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Run tests using
pytestfrom the project root
Files:
tests/test_enrichment_fallback.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
ruff check src/for linting andruff format src/for code formatting
Files:
src/brainlayer/pipeline/enrichment.py
src/brainlayer/pipeline/enrichment.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/pipeline/enrichment.py: Set"think": falsein Ollama API calls for GLM-4.7 to avoid unnecessary thinking mode overhead (350+ tokens, ~20s delay)
Provide enrichment backends (Ollama and MLX) with environment variable selection viaBRAINLAYER_ENRICH_BACKEND
Enrich chunks with 10 metadata fields: summary, tags, importance (1-10), intent, primary_symbols, resolved_query, epistemic_level, version_scope, debt_impact, and external_deps
Files:
src/brainlayer/pipeline/enrichment.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-23T16:51:38.317Z
Learning: Applies to src/brainlayer/pipeline/enrichment.py : Provide enrichment backends (Ollama and MLX) with environment variable selection via `BRAINLAYER_ENRICH_BACKEND`
📚 Learning: 2026-02-23T16:51:38.317Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-23T16:51:38.317Z
Learning: Applies to src/brainlayer/pipeline/enrichment.py : Provide enrichment backends (Ollama and MLX) with environment variable selection via `BRAINLAYER_ENRICH_BACKEND`
Applied to files:
src/brainlayer/pipeline/enrichment.py
📚 Learning: 2026-02-23T16:51:38.317Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-23T16:51:38.317Z
Learning: Applies to src/brainlayer/pipeline/enrichment.py : Enrich chunks with 10 metadata fields: summary, tags, importance (1-10), intent, primary_symbols, resolved_query, epistemic_level, version_scope, debt_impact, and external_deps
Applied to files:
src/brainlayer/pipeline/enrichment.py
📚 Learning: 2026-02-23T16:51:38.317Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-23T16:51:38.317Z
Learning: Applies to src/brainlayer/pipeline/enrichment.py : Set `"think": false` in Ollama API calls for GLM-4.7 to avoid unnecessary thinking mode overhead (350+ tokens, ~20s delay)
Applied to files:
src/brainlayer/pipeline/enrichment.py
🧬 Code graph analysis (1)
tests/test_enrichment_fallback.py (2)
src/brainlayer/pipeline/enrichment.py (3)
call_llm(459-515)run_enrichment(770-883)_check_fallback_available(440-456)src/brainlayer/vector_store.py (1)
get_enrichment_stats(1385-1419)
🪛 GitHub Actions: CI
tests/test_enrichment_fallback.py
[error] ruff format check detected formatting issues. File would be reformatted. Run 'ruff format' to fix code style.
src/brainlayer/pipeline/enrichment.py
[error] ruff format check detected formatting issues. File would be reformatted. Run 'ruff format' to fix code style.
🔇 Additional comments (2)
tests/test_enrichment_fallback.py (2)
37-75: Nice coverage of fallback transitions and availability caching paths.These cases validate threshold activation, reverse fallback direction, sticky fallback mode, and cached availability behavior well.
Also applies to: 103-123
31-33:⚠️ Potential issue | 🟠 MajorFix Ruff formatting for multiline
withblocks (currently CI-blocking).The pipeline already reports
ruff format checkfailure; these backslash-continuedwithblocks are a likely source and should be reformatted before merge.🧹 Ruff-compatible pattern
- with patch.object(enrichment, "call_mlx", return_value=None), \ - patch.object(enrichment, "_check_fallback_available", return_value=False): + with ( + patch.object(enrichment, "call_mlx", return_value=None), + patch.object(enrichment, "_check_fallback_available", return_value=False), + ):Also applies to: 40-43, 50-52, 60-62, 69-72
⛔ Skipped due to learnings
Learnt from: CR Repo: EtanHey/brainlayer PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-02-23T16:51:38.317Z Learning: Applies to src/brainlayer/pipeline/enrichment.py : Provide enrichment backends (Ollama and MLX) with environment variable selection via `BRAINLAYER_ENRICH_BACKEND`Learnt from: CR Repo: EtanHey/brainlayer PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-02-23T16:51:38.317Z Learning: Applies to src/**/*.py : Use `ruff check src/` for linting and `ruff format src/` for code formatting
| # Mid-run fallback state — tracks consecutive failures for automatic backend switching. | ||
| # When the primary backend crashes mid-run (e.g., MLX "Abort trap: 6"), the pipeline | ||
| # automatically retries failed chunks on the fallback backend instead of losing the entire batch. | ||
| _consecutive_failures = 0 | ||
| _FALLBACK_THRESHOLD = 3 # Switch after 3 consecutive failures | ||
| _fallback_active = False | ||
| _fallback_available: Optional[bool] = None # None = not checked yet | ||
|
|
||
|
|
||
| def _check_fallback_available(primary: str) -> bool: | ||
| """Check if the fallback backend is reachable. Cached for the run.""" | ||
| global _fallback_available | ||
| if _fallback_available is not None: | ||
| return _fallback_available | ||
|
|
||
| fallback = "ollama" if primary == "mlx" else "mlx" | ||
| try: | ||
| if fallback == "ollama": | ||
| resp = requests.get(f"{OLLAMA_BASE_URL}/api/tags", timeout=3) | ||
| else: | ||
| resp = requests.get(f"{MLX_BASE_URL}/v1/models", timeout=3) | ||
| resp.raise_for_status() | ||
| _fallback_available = True | ||
| except Exception: | ||
| _fallback_available = False | ||
| return _fallback_available | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the code structure and threading claims
echo "=== Checking ThreadPoolExecutor and concurrent submission ==="
sed -n '700,710p' src/brainlayer/pipeline/enrichment.py
echo -e "\n=== Checking call_llm function definition and global state mutations ==="
sed -n '625,635p' src/brainlayer/pipeline/enrichment.py
echo -e "\n=== Checking global state mutations in call_llm (469-515) ==="
sed -n '469,515p' src/brainlayer/pipeline/enrichment.py
echo -e "\n=== Checking state reset in run_enrichment (779-783) ==="
sed -n '779,783p' src/brainlayer/pipeline/enrichment.py
echo -e "\n=== Searching for any existing thread locks ==="
rg -n 'Lock|mutex|RLock|Semaphore|threading' src/brainlayer/pipeline/enrichment.pyRepository: EtanHey/brainlayer
Length of output: 3213
Protect fallback state with a lock in parallel execution.
call_llm() mutates shared module state (_consecutive_failures, _fallback_active, _fallback_available) while worker threads execute concurrently via ThreadPoolExecutor (line 703–706). The read-modify-write operations on these globals—especially the increment and threshold check on _consecutive_failures (lines 494–495) and the set operation on _fallback_active (line 498)—are unsynchronized, creating race conditions that cause nondeterministic fallback activation.
🔧 Suggested fix (thread-safe fallback state)
_consecutive_failures = 0
_FALLBACK_THRESHOLD = 3 # Switch after 3 consecutive failures
_fallback_active = False
_fallback_available: Optional[bool] = None # None = not checked yet
+_fallback_state_lock = threading.Lock()Apply lock protection to all read/modify/write operations on these three globals in call_llm() (lines 474, 489, 494–498) and _check_fallback_available(), and to the reset in run_enrichment() (lines 779–783).
🧰 Tools
🪛 GitHub Actions: CI
[error] ruff format check detected formatting issues. File would be reformatted. Run 'ruff format' to fix code style.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/pipeline/enrichment.py` around lines 431 - 457, The fallback
shared state (_consecutive_failures, _fallback_active, _fallback_available) is
accessed concurrently and must be protected with a lock: add a module-level
threading.Lock (or RLock) and wrap all read/modify/write accesses in call_llm(),
_check_fallback_available(), and the reset logic in run_enrichment() with that
lock (acquire before reading the cached _fallback_available, before incrementing
or checking _consecutive_failures and before setting _fallback_active, and
before resetting these variables) to ensure atomicity and prevent race
conditions.
| try: | ||
| enrichment.run_enrichment(max_chunks=0, batch_size=1) | ||
| except Exception: | ||
| pass # Will fail on backend check, that's fine |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
cat -n tests/test_enrichment_fallback.py | head -120 | tail -40Repository: EtanHey/brainlayer
Length of output: 1730
🏁 Script executed:
head -20 tests/test_enrichment_fallback.pyRepository: EtanHey/brainlayer
Length of output: 558
🏁 Script executed:
sed -n '85,105p' tests/test_enrichment_fallback.pyRepository: EtanHey/brainlayer
Length of output: 798
🏁 Script executed:
rg -n "run_enrichment" tests/test_enrichment_fallback.py | head -20Repository: EtanHey/brainlayer
Length of output: 264
🏁 Script executed:
rg -n "pytest.raises" tests/ | head -10Repository: EtanHey/brainlayer
Length of output: 606
🏁 Script executed:
grep -n "import pytest" tests/test_enrichment_fallback.pyRepository: EtanHey/brainlayer
Length of output: 44
🏁 Script executed:
sed -n '107,115p' tests/test_enrichment_fallback.pyRepository: EtanHey/brainlayer
Length of output: 403
🏁 Script executed:
sed -n '1,50p' tests/test_enrichment_fallback.pyRepository: EtanHey/brainlayer
Length of output: 2083
🏁 Script executed:
grep -rn "def run_enrichment" brainlayer/Repository: EtanHey/brainlayer
Length of output: 106
🏁 Script executed:
python3 << 'EOF'
import subprocess
result = subprocess.run(['grep', '-A', '20', 'def run_enrichment', 'brainlayer/pipeline/enrichment.py'],
capture_output=True, text=True)
print(result.stdout)
EOFRepository: EtanHey/brainlayer
Length of output: 44
🏁 Script executed:
find . -name "enrichment.py" -type fRepository: EtanHey/brainlayer
Length of output: 101
🏁 Script executed:
fd -t f "enrichment.py"Repository: EtanHey/brainlayer
Length of output: 178
🏁 Script executed:
git ls-files | grep enrichmentRepository: EtanHey/brainlayer
Length of output: 685
🏁 Script executed:
sed -n '1,100p' src/brainlayer/pipeline/enrichment.py | grep -n "def run_enrichment" -A 30Repository: EtanHey/brainlayer
Length of output: 44
🏁 Script executed:
grep -n "def run_enrichment" src/brainlayer/pipeline/enrichment.pyRepository: EtanHey/brainlayer
Length of output: 85
🏁 Script executed:
sed -n '100,150p' src/brainlayer/pipeline/enrichment.pyRepository: EtanHey/brainlayer
Length of output: 2260
🏁 Script executed:
sed -n '768,850p' src/brainlayer/pipeline/enrichment.pyRepository: EtanHey/brainlayer
Length of output: 3358
🏁 Script executed:
sed -n '768,900p' src/brainlayer/pipeline/enrichment.pyRepository: EtanHey/brainlayer
Length of output: 5428
🏁 Script executed:
grep -n "import pytest\|from pytest" tests/test_enrichment_fallback.pyRepository: EtanHey/brainlayer
Length of output: 44
🏁 Script executed:
cat -n tests/test_enrichment_fallback.py | grep -A 25 "class TestRunEnrichmentResetsState"Repository: EtanHey/brainlayer
Length of output: 1227
🏁 Script executed:
python3 << 'EOF'
# Check if requests is imported in enrichment module
import subprocess
result = subprocess.run(['grep', '-n', '^import requests\|^from requests', 'src/brainlayer/pipeline/enrichment.py'],
capture_output=True, text=True)
print("Requests import:", result.stdout if result.stdout else "Not found at top level")
# Check what exceptions run_enrichment raises
result2 = subprocess.run(['grep', '-n', 'raise RuntimeError', 'src/brainlayer/pipeline/enrichment.py'],
capture_output=True, text=True)
print("RuntimeError raises:", result2.stdout)
EOFRepository: EtanHey/brainlayer
Length of output: 257
🏁 Script executed:
cd tests && python3 -m pytest --co -q test_enrichment_fallback.py 2>&1 | head -20Repository: EtanHey/brainlayer
Length of output: 1256
🏁 Script executed:
# Check if pytest is available
python3 -c "import pytest; print(pytest.__version__)"Repository: EtanHey/brainlayer
Length of output: 67
🏁 Script executed:
# Verify the backend check happens before other code
sed -n '768,810p' src/brainlayer/pipeline/enrichment.pyRepository: EtanHey/brainlayer
Length of output: 1830
🏁 Script executed:
sed -n '1,50p' src/brainlayer/pipeline/enrichment.py | grep -n "import\|from"Repository: EtanHey/brainlayer
Length of output: 492
🏁 Script executed:
# Check what the actual pattern is in other tests
grep -A 10 "pytest.raises" tests/test_enrichment_fallback.pyRepository: EtanHey/brainlayer
Length of output: 44
🏁 Script executed:
# Check all imports in the test file
cat -n tests/test_enrichment_fallback.py | sed -n '1,10p'Repository: EtanHey/brainlayer
Length of output: 404
🏁 Script executed:
# Check if pytest is imported anywhere
grep -n "^import pytest\|^from pytest" tests/test_enrichment_fallback.pyRepository: EtanHey/brainlayer
Length of output: 44
🏁 Script executed:
# Look at the full test method to understand test setup
sed -n '87,115p' tests/test_enrichment_fallback.pyRepository: EtanHey/brainlayer
Length of output: 1152
🏁 Script executed:
# Check if there are other patterns in the codebase for handling expected exceptions
grep -B 5 -A 5 "pytest.raises" tests/test_storage.py | head -30Repository: EtanHey/brainlayer
Length of output: 747
🏁 Script executed:
# Verify requests module is accessible in enrichment
grep -n "requests\." src/brainlayer/pipeline/enrichment.py | head -10Repository: EtanHey/brainlayer
Length of output: 678
🏁 Script executed:
# Check what error is actually raised when requests.get fails
python3 << 'EOF'
# Simulate what happens when run_enrichment tries to connect to a non-existent backend
import requests
try:
resp = requests.get("http://localhost:9999/v1/models", timeout=5)
resp.raise_for_status()
except Exception as e:
print(f"Exception type: {type(e).__name__}")
print(f"Exception: {e}")
EOFRepository: EtanHey/brainlayer
Length of output: 573
🏁 Script executed:
# Check if the setup_method resets state for this test class
sed -n '87,92p' tests/test_enrichment_fallback.pyRepository: EtanHey/brainlayer
Length of output: 308
🏁 Script executed:
# Verify the actual exception type raised by run_enrichment when backends fail
python3 << 'EOF'
# Looking at the code path:
# 1. run_enrichment tries requests.get
# 2. If that fails with Exception, it tries fallback
# 3. If fallback also fails with Exception, it raises RuntimeError
# The test doesn't mock requests, so it will get ConnectionError
# Which will trigger the nested except, leading to RuntimeError
print("Expected behavior:")
print("1. requests.get() fails -> ConnectionError (subclass of Exception)")
print("2. First except catches it, tries fallback")
print("3. Fallback also fails -> another ConnectionError")
print("4. Second except catches it, raises RuntimeError")
print("5. Test should catch RuntimeError, not generic Exception")
EOFRepository: EtanHey/brainlayer
Length of output: 346
🏁 Script executed:
# Check if there's a setup_method in this test class
grep -A 5 "class TestRunEnrichmentResetsState" tests/test_enrichment_fallback.py | grep -E "setup_method|def test"Repository: EtanHey/brainlayer
Length of output: 103
Use pytest.raises() to explicitly assert the expected exception type.
Catching bare Exception masks potential regressions like assertion typos. The test should explicitly verify that RuntimeError is raised when the backend check fails, using the established pattern from other tests in the suite.
Suggested refactor
+import pytest
from unittest.mock import patch
from brainlayer.pipeline import enrichment
@@
with patch.object(enrichment, "VectorStore") as mock_vs:
mock_store = mock_vs.return_value
mock_store.get_enrichment_stats.return_value = {
"enriched": 0,
"enrichable": 0,
"remaining": 0,
"skipped": 0,
"percent": "0",
"total_chunks": 0,
"by_intent": {},
}
- try:
- enrichment.run_enrichment(max_chunks=0, batch_size=1)
- except Exception:
- pass # Will fail on backend check, that's fine
+ with patch.object(enrichment.requests, "get", side_effect=ConnectionError("backend down")):
+ with pytest.raises(RuntimeError):
+ enrichment.run_enrichment(max_chunks=0, batch_size=1)🧰 Tools
🪛 GitHub Actions: CI
[error] ruff format check detected formatting issues. File would be reformatted. Run 'ruff format' to fix code style.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_enrichment_fallback.py` around lines 93 - 96, Replace the bare
try/except in the test with an explicit pytest.raises assertion: call
enrichment.run_enrichment(max_chunks=0, batch_size=1) inside a
pytest.raises(RuntimeError) context so the test asserts the specific
RuntimeError from the backend check rather than swallowing any Exception; keep
the same parameters and test setup around the call to ensure behavior is
unchanged.
When MLX crashes mid-batch (e.g., "Abort trap: 6"), the pipeline now automatically switches to Ollama after 3 consecutive failures instead of failing every remaining chunk in the batch. - Track consecutive failures in call_llm() - Auto-detect fallback backend availability - Cache availability check for the run - Reset fallback state on each run_enrichment() call - 9 new tests for fallback logic Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
79a0402 to
19c6c86
Compare


Summary
run_enrichment()callContext
The nightly enrichment window (11pm-11am) has been failing because MLX crashes ~1 minute into each run. All subsequent chunks fail with "Connection refused". With this change, the pipeline detects the crash pattern and falls back to Ollama automatically.
Test plan
test_enrichment_fallback.py🤖 Generated with Claude Code
Note
Medium Risk
Adds stateful mid-run backend switching logic in the enrichment pipeline; incorrect switching/caching could mask transient failures or route load to an unintended backend, but the change is localized and covered by new tests.
Overview
Prevents enrichment runs from dying when the primary local LLM backend crashes mid-batch by adding mid-run automatic fallback in
call_llmafter 3 consecutive failures, including a one-time/cached availability probe and a retry of the failed chunk on the fallback backend.Resets fallback state at the start of each
run_enrichment()invocation, and adds a new test suite (test_enrichment_fallback.py) covering failure counting, threshold switching, stickiness of the fallback mode, reverse fallback direction (Ollama→MLX), cached availability checks, and state reset.Written by Cursor Bugbot for commit 19c6c86. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
New Features
Tests