Add blue-green materialization and staleness tracking for graphs#587
Conversation
- Implemented blue-green materialization for LadybugDB databases, allowing for seamless promotion of WIP databases to active status with minimal downtime. - Introduced a new staleness tracking mechanism that marks graphs as stale after OLTP writes, ensuring the AI and UI can accurately reflect the freshness of data. - Added endpoints for swapping and rolling back databases, enhancing database management capabilities. - Created a materialization sensor to automatically trigger materialization jobs for stale graphs. - Updated various components to support the new materialization and staleness features, including the GraphClient and LadybugDatabaseManager. - Enhanced error handling and logging throughout the materialization process for better traceability.
|
@claude please review this PR |
|
Claude finished @jfrench9's task in 3m 30s —— View job PR Review: Add blue-green materialization and staleness tracking for graphs
Overall this is well-architected. The blue-green pattern is sound, the lock design is correct, and the test coverage is solid. A few issues need attention before merge. Bugs1.
Fix: wrap in 2.
3. Sensor cursor permanently blocks graphs that fail to rematerialize
4.
api_base = (
env.API_BASE_URL if hasattr(env, "API_BASE_URL") else "http://localhost:8000"
)The CLAUDE.md rule is to use the centralized config directly ( Design Issues5.
async def __aenter__(self):
if not await self.acquire():
raise RuntimeError(f"Could not acquire materialization lock for {self.lock_key}")
return self
async def __aexit__(self, *args):
await self.release()6. Rollback endpoint is not usable after a successful swap
The rollback endpoint is only useful for a partial failure between rename steps — which is an extremely narrow window. Either:
7. Inline imports of In Sensor8.
run_key=f"stale_materialize_{graph_id}_{now.isoformat()}"Dagster uses Test Coverage Gaps9.
10. Missing All five new test files are missing the Nits
SummaryThe core blue-green implementation is solid — the file-rename swap, WAL handling, connection pool cleanup, and automatic rollback on failure are all correct. The main items blocking merge are:
|
- Added JSON cursor handling and expiry logic to the stale graph materialization sensor, improving the management of in-progress materializations. - Updated the swap endpoint to include materialization lock verification, ensuring safe promotion of WIP databases to active status. - Removed rollback functionality from the database manager and related endpoints, streamlining the database management process. - Enhanced error handling and logging throughout the materialization and database operations for better traceability and reliability. - Updated tests to cover new cursor logic and database management changes, ensuring robust functionality.
|
@claude can you review the latest changes |
|
Claude finished @jfrench9's task in 2m 31s —— View job PR Review: Latest Changes
Good progress on the second commit — most of the issues from the previous review are resolved. One newly introduced bug needs attention before merge. What Was Fixed
Bugs1. Double-lock acquisition causes 409 on every blue-green materialization This is a newly introduced regression. In other words: the operations layer acquires the lock, then calls the swap endpoint, which tries to acquire the same lock and gets blocked by itself. The fix is to thread the lock token through. The client should forward the token via the header: # client/client.py — add lock_token parameter
async def swap_database(self, graph_id: str, lock_token: str | None = None) -> dict[str, Any]:
headers = {}
if lock_token:
headers["X-Materialization-Lock-Token"] = lock_token
response = await self._request("POST", f"/databases/{graph_id}/swap", headers=headers)
return response.json()And in await client.swap_database(graph_id, lock_token=lock.token if lock else None)Design Issues2.
Preferred: asyncio.get_running_loop().run_in_executor(None, mark_graph_stale, graph_id, "report_generated")Or more idiomatically in modern Python: 3.
This is acceptable for now, but consider renaming to Nits
SummaryThe core architecture is solid. The critical item before merge is bug #1 — every normal blue-green materialization will fail with 409 because the lock is re-acquired by the swap endpoint while the caller already holds it. That's a straightforward fix in the client and the call site. Everything else is minor.
|
- Renamed `from_token` method to `from_trusted_token` in the MaterializationLock class to clarify its purpose and usage. - Enhanced error handling in the stale graph materialization sensor by adding detailed logging for malformed cursor entries. - Updated the GraphClient's swap_database method to accept an optional lock token, ensuring safe database swaps without re-acquiring locks. - Adjusted related tests to reflect the new method name and improved functionality, ensuring robust coverage for materialization lock handling.
|
@claude all issues resolved now? |
|
Claude finished @jfrench9's task in 1m 41s —— View job PR Review: Status Check
Almost everything is fixed. All critical and design issues are resolved — only three minor nits remain. What's Fixed Since Last Review
Still Open (Minor)1. Flagged in both previous reviews. Without the marker,
2. These are unconditional imports that belong at the module top. Minor, but was flagged twice. 3. Inline imports inside
SummaryThe three remaining items are all style nits with no functional impact. The core logic — blue-green swap, lock passthrough, cursor expiry, async-safe staleness marking, Dagster deduplication — is all correct. This is ready to merge if you're OK leaving the nits for a follow-up. |
Summary
Introduces a blue-green materialization strategy for graph databases along with staleness tracking, ensuring zero-downtime data refreshes. This feature allows graphs to be materialized into a staging copy and atomically swapped into production, preventing readers from encountering partially-built or inconsistent data.
Key Accomplishments
Blue-Green Materialization
materialization_lock.py): Implements a distributed locking mechanism to ensure only one materialization process runs at a time, preventing race conditions during concurrent refresh attempts.routers/databases/swap.py): New API route to atomically swap a freshly materialized staging database into the active production slot, enabling seamless blue-green deployments.core/ladybug/manager.py): Extended the graph manager to orchestrate the full blue-green lifecycle — build into staging, validate, and swap.Staleness Tracking
operations/extensions/staleness.py): New module to track when materialized data becomes stale relative to upstream source changes, enabling intelligent re-materialization decisions.routers/ledger/reports.py,routers/ledger/schedules.py): Reports and schedules endpoints now surface staleness metadata so consumers can assess data freshness.Dagster Sensor for Automated Materialization
dagster/sensors/materialization.py): A new Dagster sensor that monitors for stale graphs and automatically triggers re-materialization jobs, integrated into the Dagster definitions.MCP Tooling
middleware/mcp/tools/materialization_tools.py): Exposes materialization operations (trigger, status, swap) as MCP tools, enabling AI agents and middleware consumers to manage graph refreshes programmatically.Graph API Client Extensions
client/client.py) with methods to interact with the new swap and materialization lock endpoints.QuickBooks Pipeline Integration
Breaking Changes
None. All changes are additive. Existing materialization behavior is preserved; the blue-green swap is an opt-in enhancement layered on top.
Testing
Comprehensive test coverage added across all new components (5 new test files, 524+ lines of tests):
test_materialization_sensor.py— Validates sensor triggers on stale graphs and no-ops when data is fresh.test_materialization_lock.py— Tests lock acquisition, re-entrancy prevention, and cleanup on failure.test_swap.py— Exercises the swap endpoint including validation, rollback scenarios, and concurrent swap rejection.test_materialization_tools.py— Verifies MCP tool registration and correct dispatch of materialization operations.test_staleness.py— Confirms staleness detection logic against various upstream change scenarios.Infrastructure Considerations
🤖 Generated with Claude Code
Branch Info:
feature/materialization-improvementsmainCo-Authored-By: Claude noreply@anthropic.com