fix: chunked Neo4j flush eliminates OOM-induced ingest stall + failure-visibility signal (v4.0.1) by colombod · Pull Request #15 · microsoft/amplifier-context-intelligence

colombod · 2026-06-18T10:08:41Z

Summary

Two-phase fix for the Neo4j transaction-memory OOM that silently stalls per-session graph-index ingest in context-intelligence-server. (Tracked in microsoft-amplifier/amplifier-support #278 — issues are disabled on this repo.) Released as 4.0.0 → 4.0.1.

Phase 1 — Chunked flush (the OOM fix)

neo4j_store._flush_body now commits in bounded, phased sub-transactions (nodes → label-patches → edges), each committing independently, dual-bounded by row count and serialized byte size — so no single execute_write can exhaust the transaction-memory pool. _write_batch is unchanged.
On failure: restore the whole snapshot and re-raise — the offset never advances after a partial flush (no silent corruption). Poison handling stays line-granular via the existing drainer.
New config knobs neo4j_flush_chunk_rows (100) / neo4j_flush_chunk_bytes (4 MiB); a finite db.memory.transaction.max deployment cap as defense-in-depth.
Fixes both the backlog-ingest path and the session:end finalization path.

Phase 2 — Failure-visibility signal

SessionWorker.last_successful_flush (stamped once at the _flush_barrier boundary) and SessionRegistry.orphaned_sessions() (a registered worker whose drain task has died — the finalization orphan).
/status now surfaces per-session orphaned / last_successful_flush plus an aggregate orphaned_sessions count, so a wedged session is no longer invisible.

Consistency note

Chunked commit trades whole-flush read-time atomicity for bounded memory: a transient dangling-node window (nodes committed before their edges) becomes observable. A reader audit of every node→edge reader in neo4j_store.py, services.py, and routers/ found zero readers that assume "node exists ⇒ edges exist" — all are point-lookups or property-keyed edge lookups that tolerate the window. Durability / no-data-loss is preserved (source JSONL intact throughout).

Test plan

uv run pytest tests/ -m "not neo4j" → 1349 passed
uv run pytest tests/neo4j/ -m neo4j → 55 passed (live, disposable neo4j:5.26.22-community containers on random ports — never a live instance; local-only, not run in CI), including:
- test_oom_regression.py — the old unbounded flush OOMs and the chunked flush drains the same buffer, on both the drain and session:end paths, with a kill/restart arc reproducing the "offsets frozen across restarts" symptom.
- test_orphan_visibility.py — a genuine finalization-tail OOM driven through the real start_drain path leaves an orphan that surfaces on /status.

Enumerate every node→edge reader in neo4j_store.py, services.py, and routers/ via the three spec-mandated grep commands. Classify each hit as TOLERANT or NEEDS-FIX. Confirm get_node (neo4j_store.py:566) and get_edge (neo4j_store.py:601) are SAFE independent point-lookups. All 7 grep hits are TOLERANT: - neo4j_store.py:616 get_edge() Cypher fallback — property-filtered edge lookup, not a node→edge walk; no node-existence dependency. - services.py:70,125-127,135,143 — GraphState in-memory dict operations; write paths (70,125-127,143) and direct key lookup (135); none walk node→edge. - routers/ — zero hits. NEEDS-FIX count: 0. No code changes required. Phase-1 gate PASSED. All other Phase-1 tasks may proceed. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Add two config knobs for sub-transaction chunking in _flush_body: - neo4j_flush_chunk_rows: int = 100 (cardinality bound) - neo4j_flush_chunk_bytes: int = 4_194_304 (4 MiB payload bound) A chunk closes when EITHER bound trips first. Tests verify defaults via test_neo4j_flush_chunk_rows_default and test_neo4j_flush_chunk_bytes_default. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

…h clamp (#278) - Add flush_chunk_rows (default 100) and flush_chunk_bytes (default 4_194_304) to __init__ signature - Store as _flush_chunk_rows / _flush_chunk_bytes with max(1, value) clamp to prevent zero/negative chunks - Add _make_store_chunked helper and three new tests covering: nominal values, clamping of non-positive inputs, and default values 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Measures the JSON-serialized form of a row value (len(json.dumps(v, default=str))) rather than len() on the dict/list, which would return the element/key count and be blind to fat nested payloads such as large messages arrays or context_snapshot dicts. default=str ensures datetimes and other non-JSON-serialisable values never raise, falling back to str() length in the unlikely event json.dumps itself fails. Tests: - test_serialized_row_size_uses_serialized_form_not_len: fat dict with ~4000-char nested strings yields > 3000 (not 3 as len() would give) - test_serialized_row_size_handles_unjsonable_value: datetime value returns > 0, proving no crash on non-JSON types 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

- Add _chunk_dict(snapshot, max_rows, max_bytes) generator that yields dict chunks bounded by both row count and byte size. - Add _chunk_list(snapshot, max_rows, max_bytes) generator for list payloads with the same dual-bound logic. - Both helpers implement the one-row floor: a single oversized row is always yielded alone, never split, never looped. - _serialized_row_size() used for byte estimation in both helpers. - 5 new tests cover: row bound, byte bound, one-row floor, empty input, and list variant. No row lost or duplicated. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Add module-scoped neo4j_container_capped fixture to conftest.py that runs neo4j:5.26.22-community with NEO4J_db_memory_transaction_max=2m, mirroring the session-scoped neo4j_container bootstrap logic (random ports, 5-attempt port-flake retry on APIError 'ports are not available', httpx readiness poll up to 60s, remove=True, container.stop() teardown). Cap is set via env at startup — runtime dbms.setConfigValue does not exist on Community Edition. Create tests/neo4j/test_oom_regression.py with: - _OOM_CODE module constant - _low_retry_store() helper: constructs Neo4jGraphStore, closes original 30s-retry driver (no leak), swaps in AsyncGraphDatabase.driver with max_transaction_retry_time=2.0 - _buffer_fat_nodes() helper: buffers n single-phase node rows with ~blob_bytes blob property and UNIQUE prefix-scoped node_ids - _purge_prefix() helper: DETACH DELETE for nodes under a prefix (order-independent) - test_calibration_guard_tiny_write_succeeds: buffers one tiny node, flushes (must not raise), asserts MATCH count == 1 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

…ain-red) (#278) Add two store-level OOM regression tests to test_oom_regression.py: test_unbounded_single_phase_flush_ooms: - Enormous flush_chunk_rows/flush_chunk_bytes (10M rows, 10GB bytes) - 400 fat nodes × 20 KB = ~8 MB single-phase payload, 4× over the 2 MiB cap - Asserts TransientError with code == Neo.TransientError.General.MemoryPoolOutOfMemoryError - Asserts MATCH count == 0 (nothing commits on OOM, buffer restored) test_chunked_flush_drains_same_single_phase_buffer (RED): - Small flush_chunk_rows=50, flush_chunk_bytes=262_144 (256 KB per chunk) - Same 400 fat nodes — each chunk is ~50 × 20 KB ≈ 1 MB, 4× UNDER the cap - Currently FAILS with TransientError/MemoryPoolOutOfMemoryError because _flush_body does not use flush_chunk_rows/flush_chunk_bytes yet - GREEN state (after Task 8 fix): flush() must not raise, buffer empty, count == 400 Also adds TransientError import from neo4j.exceptions. Test run (pre-fix): PASSED calibration_guard_tiny_write_succeeds PASSED test_unbounded_single_phase_flush_ooms (OOM confirmed, count == 0) FAILED test_chunked_flush_drains_same_single_phase_buffer (genuine RED) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Replace the single-transaction _flush_body with a phased, dual-bounded, per-chunk-committed coordinator that eliminates the MemoryPoolOutOfMemoryError caused by sending all buffered nodes/edges/patches in one transaction. Changes: - _flush_body now iterates each buffer through _chunk_dict/_chunk_list with self._flush_chunk_rows / self._flush_chunk_bytes bounds - Each chunk is committed in its own independent execute_write (separate Neo4j session) — no multi-chunk explicit transactions that would re-collapse the memory bound - Phase order: nodes → label patches → edges (preserves referential integrity) - On any chunk failure: logs flush_chunk_failed + re-raises; finally block merges snapshot back into live buffers (full retry on next flush) - _write_batch is byte-for-byte unchanged Test results: - tests/neo4j/test_oom_regression.py: 3/3 passed (calibration guard, enormous-bounds OOM cause asserted, small-bounds drains exactly 400 nodes) - tests/test_neo4j_store.py: 101/101 passed (no regressions) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Add two characterization/guard tests for the phased chunked-flush coordinator (Task 8's _flush_body): - test_coordinator_every_execute_write_within_bounds: seeds 35 nodes at rows=10, captures every execute_write payload, and asserts each node chunk satisfies len(nodes)<=10 AND (total_bytes<=10_000_000 OR len==1). - test_coordinator_empty_buffer_makes_zero_calls: verifies that flush() with empty buffers short-circuits before opening a session, so execute_write is never called. Both tests pass against the existing coordinator implementation. Chunk-size arithmetic is owned by Task 5 tests and is not re-tested here. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Add parametrized test test_reraise_restores_full_snapshot_and_logs covering 3 materially different durable-progress states: - first_chunk_fails (index 0): nothing committed to Neo4j - later_chunk_same_phase_committed (index 1): first node chunk committed, second node chunk fails — partial within node phase - edge_after_nodes_committed (index 3): all 3 node chunks committed, first edge chunk fails — partial durable progress across phases In all 3 cases asserts: 1. RuntimeError('chunk boom') propagates out of flush() 2. _node_buffer and _edge_buffer fully restored to original snapshot 3. ERROR log containing 'flush_chunk_failed' is emitted Also adds helpers: - _seq_execute_write_failing_on(call_index): execute_write mock that succeeds until call_index then raises RuntimeError('chunk boom') - _wire_session(store, execute_write_mock): wires fake session boundary (MagicMock cm / __aenter__ / __aexit__) onto store Hard constraint #4 guard: coordinator must re-raise on any chunk failure and never return success after a partial flush. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Pass flush_chunk_rows and flush_chunk_bytes from Settings into the Neo4jGraphStore constructor in get_or_create. Settings is already fetched at the top of get_or_create (line 434); the new fields reuse the same settings binding without a second get_settings() call. Also extend the _SettingsProxy in tests/conftest.py to expose the two new fields so the autouse safe_settings fixture does not cause AttributeError when the registry path exercises them in tests. Test: test_get_or_create_threads_flush_chunk_bounds monkeypatches Neo4jGraphStore and start_drain, constructs SessionRegistry, calls get_or_create, and asserts flush_chunk_rows==100 and flush_chunk_bytes==4_194_304 (the Settings defaults). 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

…serted (#278) Three-leg integration test covering the _finalize_session failure path that manifests as frozen offsets across restarts (issue #278): Leg 1 OLD-FREEZE: rows=10_000_000 / byts=10_000_000_000 forces all ~201 fat nodes (~8 MB total) into a single transaction, hitting the 2 MiB per-transaction cap. Asserts: - 'finalize_tail_flush_failed' logged - _OOM_CODE ('Neo.TransientError.General.MemoryPoolOutOfMemoryError') positively present in caplog (not a proxy assertion) - Worker stays registered (finalize returned early without cleanup) - Committed offset frozen at 0 - 0 f-* ToolCall nodes committed Leg 2 RESTART: fresh SessionRegistry + SessionWorker over the same on-disk queue with the same old bounds. Confirms the freeze survives a process restart (offset still 0, 0 committed). Leg 3 RESTART FIXED: rows=50 / byts=262_144 keeps each chunk ~250 KB well under the 2 MiB cap. Asserts: - Offset advances to tail_end (full queue drained) - Worker deregistered on successful finalization - 100 f-* ToolCall nodes committed to Neo4j Queue seeding: 100 tool:pre events each carrying tool_call_id='f-{i}' and tool_input='x'*40_000 (fat ToolCall + Event nodes), plus one session:end event. Also adds _line() helper and top-level imports (json, logging, Path) to the module. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

…h (#278) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Add last_successful_flush: float = field(default_factory=time.time) to the SessionWorker @DataClass in registry.py. The field defaults to the worker's creation time (NOT 0.0) so a brand-new worker reads as fresh, not ancient. A 0.0 default would make every new worker appear to have last flushed in 1970. Defaulting to creation time means 'no flush has happened yet, but the worker is fresh.' The field will be stamped in _flush_barrier in a subsequent task. TDD: test TestLastSuccessfulFlushField::test_defaults_to_creation_time_not_zero first failed with AttributeError ('SessionWorker' object has no attribute 'last_successful_flush'), then passed after the field was added. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

… (#278) Phase 2 (#278): after awaited flush succeeds inside _flush_barrier, stamp worker.last_successful_flush = time.time(). All three drainer success paths (drain, exhausted-per-line, finalize) funnel through _flush_barrier, so this single stamp covers all of them. A separate stamp at each call site would be redundant and drift-prone. Test: TestFlushBarrierStampsLiveness.test_flush_barrier_advances_last_successful_flush - Forces worker.last_successful_flush = 0.0 before call - Asserts value >= before after _flush_barrier returns - Confirmed FAIL before fix (stays 0.0), PASS after 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

A worker is orphaned iff it is still registered in _workers AND its asyncio task has completed (task.done()). This catches the finalization-path orphan (tail flush returns early without deregistering, so the task completes but the worker remains) and any unhandled exception that escapes the drain loop. Deterministic and instant — no timer, no threshold. Three tests added (TestOrphanedSessions): - test_completed_task_worker_is_orphaned: done task → reported - test_live_task_worker_is_not_orphaned: running task → not reported - test_no_task_worker_is_not_orphaned: task=None → not reported 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

- Compute orphaned_ids set once from registry.orphaned_sessions() (single source of truth — no inline task.done() calls scattered through dict comp) - Add orphaned (bool) and last_successful_flush (float) to each per-session dict in build_status_response - Add top-level orphaned_sessions count (aggregate, safe for unauthenticated /status endpoint — no per-session error strings leaked) - Tests: TestBuildStatusResponseOrphanVisibility — 3 tests covering done-task → orphaned=True + count, running-task → orphaned=False + count=0, and last_successful_flush presence/value 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Adds tests/neo4j/test_orphan_visibility.py: a module-scoped live E2E test (pytestmark = pytest.mark.neo4j) proving a genuine finalization-path orphan surfaces on /status. The test drives the real start_drain → drain_worker → _finalize_session path — worker.task is an actual asyncio.Task that transitions to done(). Reproduction recipe (deterministic, ~29s against neo4j:5.26.22-community): Seed shape (line counts are load-bearing): Lines 1-99: tiny tool:pre (tool_input 'x'*16, key space small-{i}) Line 100: session:end (terminal; exactly fills read_batch max_items=100) Lines 101-200: fat tool:pre (tool_input 'x'*40_000, key space f-{i}, ~8 MB) WHY: The pre-terminal block is exactly 100 lines so the drainer's first read_batch returns only those, commits cleanly (tiny flush << 2 MiB cap), sets saw_terminal → _finalize_session. The finalization tail (100 fat lines) is flushed in ONE transaction (rows=10_000_000, byts=10_000_000_000) → OOM. _finalize_session does NOT retry; one OOM → finalize_tail_flush_failed log + early return → orphan (registered worker, completed task). Orphan post-state assertions (all required, none weakened): - worker.task.done() True - sid still in registry._workers (not deregistered) - 'finalize_tail_flush_failed' in caplog.text - _OOM_CODE ('Neo.TransientError.General.MemoryPoolOutOfMemoryError') in caplog.text - committed offset frozen at pre-terminal boundary (== boundary, != tail_end) - 0 f-* nodes committed (queried via a FRESH check_store) - worker in registry.orphaned_sessions() - build_status_response reports orphaned_sessions >= 1 and the session's per-session dict has orphaned: True Teardown closes the still-open store driver (early-returned _finalize_session did not call _safe_close). No production code changes. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

…rker) - Add TestVersionIs401.test_pyproject_version_is_401 that reads pyproject.toml via tomllib and asserts version == '4.0.1' (single source of truth gate) - Bump pyproject.toml line 7: version = "4.0.0" -> version = "4.0.1" Test cycle confirmed: RED: assert '4.0.0' == '4.0.1' (AssertionError before bump) GREEN: 1 passed (after bump) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Three issues from the holistic code review, all cosmetic/documentary: 1. dashboard.py build_status_response docstring — add orphaned_sessions to the Returns key list; document the orphaned_sessions count vs visible- session asymmetry (aged-out orphan contributes to count but won't appear with orphaned:True in sessions list); add orphaned and last_successful_flush to the sessions-dict key list. 2. test_orphan_visibility.py — rename test function to align with the spec's acceptance-criteria reference: test_real_drain_orphan_surfaces_on_status -> test_finalization_orphan_surfaces_on_status Resolves the Task 7 DONE_WITH_CONCERNS naming discrepancy. 3. test_version.py — rename TestVersionIs401 -> TestVersionIs4_0_1 and test_pyproject_version_is_401 -> test_pyproject_version_is_4_0_1 to eliminate the HTTP-401-status-code ambiguity in the class name. Note: review recommendation #2 (strengthen flush-value assertion) was already implemented — tests/test_dashboard.py line 492-493 already carries both the key-presence check and the value-equality check. Non-Neo4j suite: 1343 passed, 2 skipped (no regressions). 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

The reader audit conclusion (zero dangling-node readers in the codebase) is captured in the PR description. Superpowers-generated docs belong outside the product documentation tree per repo conventions. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

…h (fixes v4.0.1 drain stall) (#19) * fix: restore global lock ordering + fail-loud timeout in chunked flush (fixes v4.0.1 drain stall) PR #15 chunked flush fixed OOM but broke lock ordering: chunking by dict-insertion order instead of key meant concurrent flushes across different stores acquired locks out of global order, causing Neo4j lock contention. With unlimited lock timeout, stalled flushes parked forever, starving the write semaphore and freezing all drains. Two-layer fix: A - Sort node/edge snapshots by key before chunking in _flush_body to restore global lock-acquisition order. B - Wrap each execute_write in unit_of_work(timeout=neo4j_lock_timeout) and set connection_acquisition_timeout on driver so lock contention fails loud (raising Neo4jError) into existing retry/dead-letter path instead of parking forever. New config knob: neo4j_lock_timeout (default 30.0s). Verified locally: red-green regression cycle (fix stashed → tests fail for right reason; restored → pass), 59 neo4j tests including #15 OOM regression tests (no regression), 1302 unit tests, ruff+pyright clean. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * fix: index-back node identity lookups (:Node label) to eliminate AllNodesScan flush stall The non-Session node MERGE (n {node_id, workspace}) and edge/label-patch MATCH (n {node_id, workspace}) queries were label-free, preventing Neo4j from using any index — every index in the graph is label-scoped (:Session, :Event, etc.). This forced an AllNodesScan over the 1.3M-node graph, causing each flush to run 25–30s, at which point the 30s timeout killed it, collapsing drain throughput to near-zero. Fix: - Add universal :Node label to every node (new label on top of mutable type labels, so identity stays dual-label-independent and no duplicates occur) - Create composite index (node_id, workspace) scoped to :Node - Update node MERGE and all edge/label-patch MATCHes to seek via :Node - Idempotent startup backfill: batch-SET all existing nodes with :Node label and populate the index (10,000 rows/txn) so all 1.3M existing nodes benefit Verified: - EXPLAIN plans: AllNodesScan → NodeIndexSeek on all five label-free queries - Tests: 64 neo4j + 1302 unit tests pass (red-green cycle included) - Live server: backlog drained 58 → 0 sessions in ~2 min, slowest txn dropped from 25–30s to sub-millisecond, :Node backfill complete, OOM/flush failures to zero Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> --------- Co-authored-by: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

Amplifier and others added 23 commits June 18, 2026 01:46

test: coordinator phase ordering nodes->patches->edges (#278 A.4)

4ef31ec

ops: set finite db.memory.transaction.max deployment cap (#278)

57d93aa

test: live cross-chunk referential integrity + large-buffer happy pat…

b69da64

…h (#278) 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

colombod mentioned this pull request Jun 18, 2026

fix: chunk Neo4j flush into bounded sub-transactions to prevent transaction-memory OOM #14

Closed

colombod merged commit 3b2ad91 into main Jun 18, 2026
3 checks passed

colombod mentioned this pull request Jun 19, 2026

fix(neo4j): restore global write-lock ordering + finite lock timeout (eliminate flush deadlocks & silent drain stalls) #18

Closed

bkrabach mentioned this pull request Jun 19, 2026

fix: restore global lock ordering + fail-loud timeout in chunked flush (fixes v4.0.1 drain stall) #19

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: chunked Neo4j flush eliminates OOM-induced ingest stall + failure-visibility signal (v4.0.1)#15

fix: chunked Neo4j flush eliminates OOM-induced ingest stall + failure-visibility signal (v4.0.1)#15
colombod merged 23 commits into
mainfrom
fix/issue-278-chunked-flush-oom

colombod commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

colombod commented Jun 18, 2026

Summary

Phase 1 — Chunked flush (the OOM fix)

Phase 2 — Failure-visibility signal

Consistency note

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant