Propagate event-store OCC so projection scope can retry#365
Merged
Conversation
ProjectionScopeGAgentBase previously caught every dispatch failure and logged a warning, so transient optimistic-concurrency conflicts on the scope's own event stream were silently swallowed. The watermark-advanced event never landed, the scope sat on a stale version, and downstream consumers observed gaps with no retry. Replace the generic InvalidOperationException raised by the three event-store implementations (Garnet, File, InMemory) with a typed EventStoreOptimisticConcurrencyException carrying agentId / expected / actual versions, then let the projection scope propagate that class (directly or unwrapped from ProjectionDispatchAggregateException / AggregateException) so the actor runtime reloads state and retries with the correct version. Deterministic failures keep the prior swallow semantics to avoid retry loops on projector bugs. The new exception subclasses InvalidOperationException and preserves the original message format, so existing consumers (EventSourcingBehavior's catch-all, GarnetEventStoreIntegrationTests' message-pattern assertion) keep working without changes. Validation: - dotnet test test/Aevatar.CQRS.Projection.Core.Tests (99 pass, 1 skip) - dotnet test test/Aevatar.Foundation.Core.Tests (159 pass) - dotnet test test/Aevatar.Foundation.Runtime.Hosting.Tests --filter GarnetEventStore (1 pass, 2 skip - require Garnet server) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## dev #365 +/- ##
==========================================
+ Coverage 70.04% 70.07% +0.03%
==========================================
Files 1155 1157 +2
Lines 82584 82623 +39
Branches 10868 10872 +4
==========================================
+ Hits 57842 57895 +53
+ Misses 20563 20551 -12
+ Partials 4179 4177 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 2 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Patch coverage on #365 was 54.54% because the three new control flow additions were only exercised by a single happy-path test. Add focused unit tests for the retry-propagation branches: - EventStoreOptimisticConcurrencyException property capture and null agentId coalescing. - ProjectionObservationFailurePolicy null guard, generic AggregateException branch, inner-exception unwrap, and deterministic false path. - ProjectionScopeGAgentBase observation handler rethrows OCC via the retryable policy and swallows deterministic projection failures.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ProjectionScopeGAgentBase.DispatchObservationAsynccatches every dispatch failure and downgrades it to a warning. That means transient optimistic-concurrency conflicts on the scope's own event stream — exactly what actor-based event sourcing is designed to retry — get silently swallowed. The watermark-advanced event never lands, the scope sits on a stale version, and downstream consumers observe gaps with no follow-up retry.Production log shows the symptom:
```
fail: Aevatar.Foundation.Core.EventSourcing.EventSourcingBehavior[0]
Event sourcing commit failed. agentId=projection.durable.scope:channel-bot-registration:channel-bot-registration-store eventType=ProjectionScopeWatermarkAdvancedEvent version=4 elapsedMs=3.6452 result=failed
System.InvalidOperationException: Optimistic concurrency conflict: expected 4, actual 5
warn: Aevatar.CQRS.Projection.Core.Orchestration.ProjectionMaterializationScopeGAgent[0]
Projection scope observation handling failed.
System.InvalidOperationException: Optimistic concurrency conflict: expected 4, actual 5
```
The warning swallowed a retryable conflict that should have driven the actor runtime to reload state and retry with the correct version.
Solution
EventStoreOptimisticConcurrencyException(subclassesInvalidOperationException, keeps the original message format for backward compatibility) exposingagentId/expectedVersion/actualVersionas typed properties.InvalidOperationExceptionraised by the three event-store implementations (GarnetEventStore,FileEventStore,InMemoryEventStore) with the typed exception.ProjectionObservationFailurePolicy.ShouldPropagate(...)that recognizes the OCC exception directly, throughProjectionDispatchAggregateException, throughAggregateException, and through inner-exception chains.ProjectionScopeGAgentBaserethrow when the policy says "propagate" and keep swallowing deterministic failures (projector bugs) to avoid retry loops.Impact
src/Aevatar.Foundation.Abstractions/Persistence/EventStoreOptimisticConcurrencyException.cs(new)src/Aevatar.CQRS.Projection.Core/AssemblyInfo.cs(new, exposes internals to tests)src/Aevatar.CQRS.Projection.Core/Orchestration/ProjectionObservationFailurePolicy.cs(new)src/Aevatar.CQRS.Projection.Core/Orchestration/ProjectionScopeGAgentBase.cssrc/Aevatar.Foundation.Runtime.Persistence.Implementations.Garnet/GarnetEventStore.cssrc/Aevatar.Foundation.Runtime/Persistence/FileEventStore.cssrc/Aevatar.Foundation.Runtime/Persistence/InMemoryEventStore.cstest/Aevatar.CQRS.Projection.Core.Tests/ProjectionObservationFailurePolicyTests.cs(new)Existing consumers remain compatible:
EventSourcingBehavior.ConfirmEventsAsyncusescatch(Exception)— unaffected.GarnetEventStoreIntegrationTestsassertsInvalidOperationException+ message pattern — still passes because the new type subclassesInvalidOperationExceptionand preserves the message format.HouseholdEntityDeviceInboundTests,ChannelBotRegistrationStoreTests, etc.) keep their own in-processInvalidOperationException; they exercise caller behavior, not the projection-scope retry path, so the asymmetry is acceptable.Test plan
dotnet test test/Aevatar.CQRS.Projection.Core.Tests/Aevatar.CQRS.Projection.Core.Tests.csproj --nologo— 99 pass, 1 skip (ES integration)dotnet test test/Aevatar.Foundation.Core.Tests/Aevatar.Foundation.Core.Tests.csproj --nologo— 159 passdotnet test test/Aevatar.Foundation.Runtime.Hosting.Tests/Aevatar.Foundation.Runtime.Hosting.Tests.csproj --nologo --filter "FullyQualifiedName~GarnetEventStore"— 1 pass, 2 skip (require live Garnet)🤖 Generated with Claude Code