fix: chunked Neo4j flush eliminates OOM-induced ingest stall + failure-visibility signal (v4.0.1)#15
Merged
Merged
Conversation
Enumerate every node→edge reader in neo4j_store.py, services.py, and routers/ via the three spec-mandated grep commands. Classify each hit as TOLERANT or NEEDS-FIX. Confirm get_node (neo4j_store.py:566) and get_edge (neo4j_store.py:601) are SAFE independent point-lookups. All 7 grep hits are TOLERANT: - neo4j_store.py:616 get_edge() Cypher fallback — property-filtered edge lookup, not a node→edge walk; no node-existence dependency. - services.py:70,125-127,135,143 — GraphState in-memory dict operations; write paths (70,125-127,143) and direct key lookup (135); none walk node→edge. - routers/ — zero hits. NEEDS-FIX count: 0. No code changes required. Phase-1 gate PASSED. All other Phase-1 tasks may proceed. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Add two config knobs for sub-transaction chunking in _flush_body: - neo4j_flush_chunk_rows: int = 100 (cardinality bound) - neo4j_flush_chunk_bytes: int = 4_194_304 (4 MiB payload bound) A chunk closes when EITHER bound trips first. Tests verify defaults via test_neo4j_flush_chunk_rows_default and test_neo4j_flush_chunk_bytes_default. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…h clamp (#278) - Add flush_chunk_rows (default 100) and flush_chunk_bytes (default 4_194_304) to __init__ signature - Store as _flush_chunk_rows / _flush_chunk_bytes with max(1, value) clamp to prevent zero/negative chunks - Add _make_store_chunked helper and three new tests covering: nominal values, clamping of non-positive inputs, and default values 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Measures the JSON-serialized form of a row value (len(json.dumps(v, default=str))) rather than len() on the dict/list, which would return the element/key count and be blind to fat nested payloads such as large messages arrays or context_snapshot dicts. default=str ensures datetimes and other non-JSON-serialisable values never raise, falling back to str() length in the unlikely event json.dumps itself fails. Tests: - test_serialized_row_size_uses_serialized_form_not_len: fat dict with ~4000-char nested strings yields > 3000 (not 3 as len() would give) - test_serialized_row_size_handles_unjsonable_value: datetime value returns > 0, proving no crash on non-JSON types 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
- Add _chunk_dict(snapshot, max_rows, max_bytes) generator that yields dict chunks bounded by both row count and byte size. - Add _chunk_list(snapshot, max_rows, max_bytes) generator for list payloads with the same dual-bound logic. - Both helpers implement the one-row floor: a single oversized row is always yielded alone, never split, never looped. - _serialized_row_size() used for byte estimation in both helpers. - 5 new tests cover: row bound, byte bound, one-row floor, empty input, and list variant. No row lost or duplicated. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Add module-scoped neo4j_container_capped fixture to conftest.py that runs neo4j:5.26.22-community with NEO4J_db_memory_transaction_max=2m, mirroring the session-scoped neo4j_container bootstrap logic (random ports, 5-attempt port-flake retry on APIError 'ports are not available', httpx readiness poll up to 60s, remove=True, container.stop() teardown). Cap is set via env at startup — runtime dbms.setConfigValue does not exist on Community Edition. Create tests/neo4j/test_oom_regression.py with: - _OOM_CODE module constant - _low_retry_store() helper: constructs Neo4jGraphStore, closes original 30s-retry driver (no leak), swaps in AsyncGraphDatabase.driver with max_transaction_retry_time=2.0 - _buffer_fat_nodes() helper: buffers n single-phase node rows with ~blob_bytes blob property and UNIQUE prefix-scoped node_ids - _purge_prefix() helper: DETACH DELETE for nodes under a prefix (order-independent) - test_calibration_guard_tiny_write_succeeds: buffers one tiny node, flushes (must not raise), asserts MATCH count == 1 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…ain-red) (#278)
Add two store-level OOM regression tests to test_oom_regression.py:
test_unbounded_single_phase_flush_ooms:
- Enormous flush_chunk_rows/flush_chunk_bytes (10M rows, 10GB bytes)
- 400 fat nodes × 20 KB = ~8 MB single-phase payload, 4× over the 2 MiB cap
- Asserts TransientError with code == Neo.TransientError.General.MemoryPoolOutOfMemoryError
- Asserts MATCH count == 0 (nothing commits on OOM, buffer restored)
test_chunked_flush_drains_same_single_phase_buffer (RED):
- Small flush_chunk_rows=50, flush_chunk_bytes=262_144 (256 KB per chunk)
- Same 400 fat nodes — each chunk is ~50 × 20 KB ≈ 1 MB, 4× UNDER the cap
- Currently FAILS with TransientError/MemoryPoolOutOfMemoryError because
_flush_body does not use flush_chunk_rows/flush_chunk_bytes yet
- GREEN state (after Task 8 fix): flush() must not raise, buffer empty, count == 400
Also adds TransientError import from neo4j.exceptions.
Test run (pre-fix):
PASSED calibration_guard_tiny_write_succeeds
PASSED test_unbounded_single_phase_flush_ooms (OOM confirmed, count == 0)
FAILED test_chunked_flush_drains_same_single_phase_buffer (genuine RED)
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Replace the single-transaction _flush_body with a phased, dual-bounded, per-chunk-committed coordinator that eliminates the MemoryPoolOutOfMemoryError caused by sending all buffered nodes/edges/patches in one transaction. Changes: - _flush_body now iterates each buffer through _chunk_dict/_chunk_list with self._flush_chunk_rows / self._flush_chunk_bytes bounds - Each chunk is committed in its own independent execute_write (separate Neo4j session) — no multi-chunk explicit transactions that would re-collapse the memory bound - Phase order: nodes → label patches → edges (preserves referential integrity) - On any chunk failure: logs flush_chunk_failed + re-raises; finally block merges snapshot back into live buffers (full retry on next flush) - _write_batch is byte-for-byte unchanged Test results: - tests/neo4j/test_oom_regression.py: 3/3 passed (calibration guard, enormous-bounds OOM cause asserted, small-bounds drains exactly 400 nodes) - tests/test_neo4j_store.py: 101/101 passed (no regressions) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Add two characterization/guard tests for the phased chunked-flush coordinator (Task 8's _flush_body): - test_coordinator_every_execute_write_within_bounds: seeds 35 nodes at rows=10, captures every execute_write payload, and asserts each node chunk satisfies len(nodes)<=10 AND (total_bytes<=10_000_000 OR len==1). - test_coordinator_empty_buffer_makes_zero_calls: verifies that flush() with empty buffers short-circuits before opening a session, so execute_write is never called. Both tests pass against the existing coordinator implementation. Chunk-size arithmetic is owned by Task 5 tests and is not re-tested here. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Add parametrized test test_reraise_restores_full_snapshot_and_logs covering 3
materially different durable-progress states:
- first_chunk_fails (index 0): nothing committed to Neo4j
- later_chunk_same_phase_committed (index 1): first node chunk committed,
second node chunk fails — partial within node phase
- edge_after_nodes_committed (index 3): all 3 node chunks committed,
first edge chunk fails — partial durable progress across phases
In all 3 cases asserts:
1. RuntimeError('chunk boom') propagates out of flush()
2. _node_buffer and _edge_buffer fully restored to original snapshot
3. ERROR log containing 'flush_chunk_failed' is emitted
Also adds helpers:
- _seq_execute_write_failing_on(call_index): execute_write mock that
succeeds until call_index then raises RuntimeError('chunk boom')
- _wire_session(store, execute_write_mock): wires fake session boundary
(MagicMock cm / __aenter__ / __aexit__) onto store
Hard constraint #4 guard: coordinator must re-raise on any chunk failure
and never return success after a partial flush.
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Pass flush_chunk_rows and flush_chunk_bytes from Settings into the Neo4jGraphStore constructor in get_or_create. Settings is already fetched at the top of get_or_create (line 434); the new fields reuse the same settings binding without a second get_settings() call. Also extend the _SettingsProxy in tests/conftest.py to expose the two new fields so the autouse safe_settings fixture does not cause AttributeError when the registry path exercises them in tests. Test: test_get_or_create_threads_flush_chunk_bounds monkeypatches Neo4jGraphStore and start_drain, constructs SessionRegistry, calls get_or_create, and asserts flush_chunk_rows==100 and flush_chunk_bytes==4_194_304 (the Settings defaults). 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…serted (#278)
Three-leg integration test covering the _finalize_session failure path
that manifests as frozen offsets across restarts (issue #278):
Leg 1 OLD-FREEZE: rows=10_000_000 / byts=10_000_000_000 forces all ~201
fat nodes (~8 MB total) into a single transaction, hitting the 2 MiB
per-transaction cap. Asserts:
- 'finalize_tail_flush_failed' logged
- _OOM_CODE ('Neo.TransientError.General.MemoryPoolOutOfMemoryError')
positively present in caplog (not a proxy assertion)
- Worker stays registered (finalize returned early without cleanup)
- Committed offset frozen at 0
- 0 f-* ToolCall nodes committed
Leg 2 RESTART: fresh SessionRegistry + SessionWorker over the same
on-disk queue with the same old bounds. Confirms the freeze survives
a process restart (offset still 0, 0 committed).
Leg 3 RESTART FIXED: rows=50 / byts=262_144 keeps each chunk ~250 KB
well under the 2 MiB cap. Asserts:
- Offset advances to tail_end (full queue drained)
- Worker deregistered on successful finalization
- 100 f-* ToolCall nodes committed to Neo4j
Queue seeding: 100 tool:pre events each carrying tool_call_id='f-{i}'
and tool_input='x'*40_000 (fat ToolCall + Event nodes), plus one
session:end event. Also adds _line() helper and top-level imports
(json, logging, Path) to the module.
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…h (#278) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Add last_successful_flush: float = field(default_factory=time.time) to the SessionWorker @DataClass in registry.py. The field defaults to the worker's creation time (NOT 0.0) so a brand-new worker reads as fresh, not ancient. A 0.0 default would make every new worker appear to have last flushed in 1970. Defaulting to creation time means 'no flush has happened yet, but the worker is fresh.' The field will be stamped in _flush_barrier in a subsequent task. TDD: test TestLastSuccessfulFlushField::test_defaults_to_creation_time_not_zero first failed with AttributeError ('SessionWorker' object has no attribute 'last_successful_flush'), then passed after the field was added. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
… (#278) Phase 2 (#278): after awaited flush succeeds inside _flush_barrier, stamp worker.last_successful_flush = time.time(). All three drainer success paths (drain, exhausted-per-line, finalize) funnel through _flush_barrier, so this single stamp covers all of them. A separate stamp at each call site would be redundant and drift-prone. Test: TestFlushBarrierStampsLiveness.test_flush_barrier_advances_last_successful_flush - Forces worker.last_successful_flush = 0.0 before call - Asserts value >= before after _flush_barrier returns - Confirmed FAIL before fix (stays 0.0), PASS after 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
A worker is orphaned iff it is still registered in _workers AND its asyncio task has completed (task.done()). This catches the finalization-path orphan (tail flush returns early without deregistering, so the task completes but the worker remains) and any unhandled exception that escapes the drain loop. Deterministic and instant — no timer, no threshold. Three tests added (TestOrphanedSessions): - test_completed_task_worker_is_orphaned: done task → reported - test_live_task_worker_is_not_orphaned: running task → not reported - test_no_task_worker_is_not_orphaned: task=None → not reported 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
- Compute orphaned_ids set once from registry.orphaned_sessions() (single source of truth — no inline task.done() calls scattered through dict comp) - Add orphaned (bool) and last_successful_flush (float) to each per-session dict in build_status_response - Add top-level orphaned_sessions count (aggregate, safe for unauthenticated /status endpoint — no per-session error strings leaked) - Tests: TestBuildStatusResponseOrphanVisibility — 3 tests covering done-task → orphaned=True + count, running-task → orphaned=False + count=0, and last_successful_flush presence/value 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Adds tests/neo4j/test_orphan_visibility.py: a module-scoped live E2E test
(pytestmark = pytest.mark.neo4j) proving a genuine finalization-path orphan
surfaces on /status. The test drives the real start_drain → drain_worker →
_finalize_session path — worker.task is an actual asyncio.Task that
transitions to done().
Reproduction recipe (deterministic, ~29s against neo4j:5.26.22-community):
Seed shape (line counts are load-bearing):
Lines 1-99: tiny tool:pre (tool_input 'x'*16, key space small-{i})
Line 100: session:end (terminal; exactly fills read_batch max_items=100)
Lines 101-200: fat tool:pre (tool_input 'x'*40_000, key space f-{i}, ~8 MB)
WHY:
The pre-terminal block is exactly 100 lines so the drainer's first read_batch
returns only those, commits cleanly (tiny flush << 2 MiB cap), sets
saw_terminal → _finalize_session. The finalization tail (100 fat lines) is
flushed in ONE transaction (rows=10_000_000, byts=10_000_000_000) → OOM.
_finalize_session does NOT retry; one OOM → finalize_tail_flush_failed log
+ early return → orphan (registered worker, completed task).
Orphan post-state assertions (all required, none weakened):
- worker.task.done() True
- sid still in registry._workers (not deregistered)
- 'finalize_tail_flush_failed' in caplog.text
- _OOM_CODE ('Neo.TransientError.General.MemoryPoolOutOfMemoryError') in caplog.text
- committed offset frozen at pre-terminal boundary (== boundary, != tail_end)
- 0 f-* nodes committed (queried via a FRESH check_store)
- worker in registry.orphaned_sessions()
- build_status_response reports orphaned_sessions >= 1 and
the session's per-session dict has orphaned: True
Teardown closes the still-open store driver (early-returned _finalize_session
did not call _safe_close). No production code changes.
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…rker) - Add TestVersionIs401.test_pyproject_version_is_401 that reads pyproject.toml via tomllib and asserts version == '4.0.1' (single source of truth gate) - Bump pyproject.toml line 7: version = "4.0.0" -> version = "4.0.1" Test cycle confirmed: RED: assert '4.0.0' == '4.0.1' (AssertionError before bump) GREEN: 1 passed (after bump) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Three issues from the holistic code review, all cosmetic/documentary:
1. dashboard.py build_status_response docstring — add orphaned_sessions to
the Returns key list; document the orphaned_sessions count vs visible-
session asymmetry (aged-out orphan contributes to count but won't appear
with orphaned:True in sessions list); add orphaned and last_successful_flush
to the sessions-dict key list.
2. test_orphan_visibility.py — rename test function to align with the spec's
acceptance-criteria reference:
test_real_drain_orphan_surfaces_on_status
-> test_finalization_orphan_surfaces_on_status
Resolves the Task 7 DONE_WITH_CONCERNS naming discrepancy.
3. test_version.py — rename TestVersionIs401 -> TestVersionIs4_0_1 and
test_pyproject_version_is_401 -> test_pyproject_version_is_4_0_1 to
eliminate the HTTP-401-status-code ambiguity in the class name.
Note: review recommendation #2 (strengthen flush-value assertion) was
already implemented — tests/test_dashboard.py line 492-493 already carries
both the key-presence check and the value-equality check.
Non-Neo4j suite: 1343 passed, 2 skipped (no regressions).
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
The reader audit conclusion (zero dangling-node readers in the codebase) is captured in the PR description. Superpowers-generated docs belong outside the product documentation tree per repo conventions. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
colombod
pushed a commit
that referenced
this pull request
Jun 19, 2026
…h (fixes v4.0.1 drain stall) (#19) * fix: restore global lock ordering + fail-loud timeout in chunked flush (fixes v4.0.1 drain stall) PR #15 chunked flush fixed OOM but broke lock ordering: chunking by dict-insertion order instead of key meant concurrent flushes across different stores acquired locks out of global order, causing Neo4j lock contention. With unlimited lock timeout, stalled flushes parked forever, starving the write semaphore and freezing all drains. Two-layer fix: A - Sort node/edge snapshots by key before chunking in _flush_body to restore global lock-acquisition order. B - Wrap each execute_write in unit_of_work(timeout=neo4j_lock_timeout) and set connection_acquisition_timeout on driver so lock contention fails loud (raising Neo4jError) into existing retry/dead-letter path instead of parking forever. New config knob: neo4j_lock_timeout (default 30.0s). Verified locally: red-green regression cycle (fix stashed → tests fail for right reason; restored → pass), 59 neo4j tests including #15 OOM regression tests (no regression), 1302 unit tests, ruff+pyright clean. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * fix: index-back node identity lookups (:Node label) to eliminate AllNodesScan flush stall The non-Session node MERGE (n {node_id, workspace}) and edge/label-patch MATCH (n {node_id, workspace}) queries were label-free, preventing Neo4j from using any index — every index in the graph is label-scoped (:Session, :Event, etc.). This forced an AllNodesScan over the 1.3M-node graph, causing each flush to run 25–30s, at which point the 30s timeout killed it, collapsing drain throughput to near-zero. Fix: - Add universal :Node label to every node (new label on top of mutable type labels, so identity stays dual-label-independent and no duplicates occur) - Create composite index (node_id, workspace) scoped to :Node - Update node MERGE and all edge/label-patch MATCHes to seek via :Node - Idempotent startup backfill: batch-SET all existing nodes with :Node label and populate the index (10,000 rows/txn) so all 1.3M existing nodes benefit Verified: - EXPLAIN plans: AllNodesScan → NodeIndexSeek on all five label-free queries - Tests: 64 neo4j + 1302 unit tests pass (red-green cycle included) - Live server: backlog drained 58 → 0 sessions in ~2 min, slowest txn dropped from 25–30s to sub-millisecond, :Node backfill complete, OOM/flush failures to zero Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> --------- Co-authored-by: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two-phase fix for the Neo4j transaction-memory OOM that silently stalls per-session graph-index ingest in context-intelligence-server. (Tracked in microsoft-amplifier/amplifier-support #278 — issues are disabled on this repo.) Released as 4.0.0 → 4.0.1.
Phase 1 — Chunked flush (the OOM fix)
neo4j_store._flush_bodynow commits in bounded, phased sub-transactions (nodes → label-patches → edges), each committing independently, dual-bounded by row count and serialized byte size — so no singleexecute_writecan exhaust the transaction-memory pool._write_batchis unchanged.neo4j_flush_chunk_rows(100) /neo4j_flush_chunk_bytes(4 MiB); a finitedb.memory.transaction.maxdeployment cap as defense-in-depth.session:endfinalization path.Phase 2 — Failure-visibility signal
SessionWorker.last_successful_flush(stamped once at the_flush_barrierboundary) andSessionRegistry.orphaned_sessions()(a registered worker whose drain task has died — the finalization orphan)./statusnow surfaces per-sessionorphaned/last_successful_flushplus an aggregateorphaned_sessionscount, so a wedged session is no longer invisible.Consistency note
Chunked commit trades whole-flush read-time atomicity for bounded memory: a transient dangling-node window (nodes committed before their edges) becomes observable. A reader audit of every node→edge reader in
neo4j_store.py,services.py, androuters/found zero readers that assume "node exists ⇒ edges exist" — all are point-lookups or property-keyed edge lookups that tolerate the window. Durability / no-data-loss is preserved (source JSONL intact throughout).Test plan
uv run pytest tests/ -m "not neo4j"→ 1349 passeduv run pytest tests/neo4j/ -m neo4j→ 55 passed (live, disposableneo4j:5.26.22-communitycontainers on random ports — never a live instance; local-only, not run in CI), including:test_oom_regression.py— the old unbounded flush OOMs and the chunked flush drains the same buffer, on both the drain andsession:endpaths, with a kill/restart arc reproducing the "offsets frozen across restarts" symptom.test_orphan_visibility.py— a genuine finalization-tail OOM driven through the realstart_drainpath leaves an orphan that surfaces on/status.