Skip to content

Propagate event-store OCC so projection scope can retry#365

Merged
eanzhao merged 4 commits into
devfrom
fix/2026-04-24_projection-scope-occ-propagation
Apr 24, 2026
Merged

Propagate event-store OCC so projection scope can retry#365
eanzhao merged 4 commits into
devfrom
fix/2026-04-24_projection-scope-occ-propagation

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented Apr 24, 2026

Problem

ProjectionScopeGAgentBase.DispatchObservationAsync catches every dispatch failure and downgrades it to a warning. That means transient optimistic-concurrency conflicts on the scope's own event stream — exactly what actor-based event sourcing is designed to retry — get silently swallowed. The watermark-advanced event never lands, the scope sits on a stale version, and downstream consumers observe gaps with no follow-up retry.

Production log shows the symptom:

```
fail: Aevatar.Foundation.Core.EventSourcing.EventSourcingBehavior[0]
Event sourcing commit failed. agentId=projection.durable.scope:channel-bot-registration:channel-bot-registration-store eventType=ProjectionScopeWatermarkAdvancedEvent version=4 elapsedMs=3.6452 result=failed
System.InvalidOperationException: Optimistic concurrency conflict: expected 4, actual 5
warn: Aevatar.CQRS.Projection.Core.Orchestration.ProjectionMaterializationScopeGAgent[0]
Projection scope observation handling failed.
System.InvalidOperationException: Optimistic concurrency conflict: expected 4, actual 5
```

The warning swallowed a retryable conflict that should have driven the actor runtime to reload state and retry with the correct version.

Solution

  • Introduce typed EventStoreOptimisticConcurrencyException (subclasses InvalidOperationException, keeps the original message format for backward compatibility) exposing agentId / expectedVersion / actualVersion as typed properties.
  • Replace the generic InvalidOperationException raised by the three event-store implementations (GarnetEventStore, FileEventStore, InMemoryEventStore) with the typed exception.
  • Add internal ProjectionObservationFailurePolicy.ShouldPropagate(...) that recognizes the OCC exception directly, through ProjectionDispatchAggregateException, through AggregateException, and through inner-exception chains.
  • Let ProjectionScopeGAgentBase rethrow when the policy says "propagate" and keep swallowing deterministic failures (projector bugs) to avoid retry loops.
  • Add regression tests covering direct, aggregate-wrapped, and non-OCC failure cases.

Impact

  • src/Aevatar.Foundation.Abstractions/Persistence/EventStoreOptimisticConcurrencyException.cs (new)
  • src/Aevatar.CQRS.Projection.Core/AssemblyInfo.cs (new, exposes internals to tests)
  • src/Aevatar.CQRS.Projection.Core/Orchestration/ProjectionObservationFailurePolicy.cs (new)
  • src/Aevatar.CQRS.Projection.Core/Orchestration/ProjectionScopeGAgentBase.cs
  • src/Aevatar.Foundation.Runtime.Persistence.Implementations.Garnet/GarnetEventStore.cs
  • src/Aevatar.Foundation.Runtime/Persistence/FileEventStore.cs
  • src/Aevatar.Foundation.Runtime/Persistence/InMemoryEventStore.cs
  • test/Aevatar.CQRS.Projection.Core.Tests/ProjectionObservationFailurePolicyTests.cs (new)

Existing consumers remain compatible:

  • EventSourcingBehavior.ConfirmEventsAsync uses catch(Exception) — unaffected.
  • GarnetEventStoreIntegrationTests asserts InvalidOperationException + message pattern — still passes because the new type subclasses InvalidOperationException and preserves the message format.
  • Test doubles in other projects (HouseholdEntityDeviceInboundTests, ChannelBotRegistrationStoreTests, etc.) keep their own in-process InvalidOperationException; they exercise caller behavior, not the projection-scope retry path, so the asymmetry is acceptable.

Test plan

  • dotnet test test/Aevatar.CQRS.Projection.Core.Tests/Aevatar.CQRS.Projection.Core.Tests.csproj --nologo — 99 pass, 1 skip (ES integration)
  • dotnet test test/Aevatar.Foundation.Core.Tests/Aevatar.Foundation.Core.Tests.csproj --nologo — 159 pass
  • dotnet test test/Aevatar.Foundation.Runtime.Hosting.Tests/Aevatar.Foundation.Runtime.Hosting.Tests.csproj --nologo --filter "FullyQualifiedName~GarnetEventStore" — 1 pass, 2 skip (require live Garnet)
  • CI: architecture + solution-split guards

🤖 Generated with Claude Code

ProjectionScopeGAgentBase previously caught every dispatch failure and
logged a warning, so transient optimistic-concurrency conflicts on the
scope's own event stream were silently swallowed. The watermark-advanced
event never landed, the scope sat on a stale version, and downstream
consumers observed gaps with no retry.

Replace the generic InvalidOperationException raised by the three
event-store implementations (Garnet, File, InMemory) with a typed
EventStoreOptimisticConcurrencyException carrying agentId / expected /
actual versions, then let the projection scope propagate that class
(directly or unwrapped from ProjectionDispatchAggregateException /
AggregateException) so the actor runtime reloads state and retries with
the correct version. Deterministic failures keep the prior swallow
semantics to avoid retry loops on projector bugs.

The new exception subclasses InvalidOperationException and preserves the
original message format, so existing consumers (EventSourcingBehavior's
catch-all, GarnetEventStoreIntegrationTests' message-pattern assertion)
keep working without changes.

Validation:
- dotnet test test/Aevatar.CQRS.Projection.Core.Tests (99 pass, 1 skip)
- dotnet test test/Aevatar.Foundation.Core.Tests (159 pass)
- dotnet test test/Aevatar.Foundation.Runtime.Hosting.Tests --filter GarnetEventStore (1 pass, 2 skip - require Garnet server)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 90.90909% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.07%. Comparing base (0a15750) to head (c7c4c5b).
⚠️ Report is 5 commits behind head on dev.

Files with missing lines Patch % Lines
...istence.Implementations.Garnet/GarnetEventStore.cs 0.00% 4 Missing ⚠️
@@            Coverage Diff             @@
##              dev     #365      +/-   ##
==========================================
+ Coverage   70.04%   70.07%   +0.03%     
==========================================
  Files        1155     1157       +2     
  Lines       82584    82623      +39     
  Branches    10868    10872       +4     
==========================================
+ Hits        57842    57895      +53     
+ Misses      20563    20551      -12     
+ Partials     4179     4177       -2     
Flag Coverage Δ
ci 70.07% <90.90%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rchestration/ProjectionObservationFailurePolicy.cs 100.00% <100.00%> (ø)
...on.Core/Orchestration/ProjectionScopeGAgentBase.cs 75.46% <100.00%> (+7.27%) ⬆️
...stence/EventStoreOptimisticConcurrencyException.cs 100.00% <100.00%> (ø)
...r.Foundation.Runtime/Persistence/FileEventStore.cs 71.28% <100.00%> (+0.29%) ⬆️
...undation.Runtime/Persistence/InMemoryEventStore.cs 92.98% <100.00%> (+0.52%) ⬆️
...istence.Implementations.Garnet/GarnetEventStore.cs 0.00% <0.00%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

eanzhao and others added 2 commits April 24, 2026 11:46
Patch coverage on #365 was 54.54% because the three new control flow
additions were only exercised by a single happy-path test. Add focused
unit tests for the retry-propagation branches:

- EventStoreOptimisticConcurrencyException property capture and null
  agentId coalescing.
- ProjectionObservationFailurePolicy null guard, generic AggregateException
  branch, inner-exception unwrap, and deterministic false path.
- ProjectionScopeGAgentBase observation handler rethrows OCC via the
  retryable policy and swallows deterministic projection failures.
@eanzhao eanzhao merged commit 582a96f into dev Apr 24, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant