fix(cloud/client): push events with watermark + backfill script (Bug 2)#162
Conversation
…ug 2) Pairs with gradata-cloud PR #12. Was Bug 2 from /tmp/audit-bug2-watermark.md. - client.sync() now reads events.jsonl, filters by last_sync_at watermark, batches 500 at a time, advances cursor on 200, retries with smaller batch on 413. - Sync state at <BRAIN_DIR>/.gradata-sync-state.json (separate from events.jsonl which stays append-only and untouched). - 9/9 new tests pass in tests/test_cloud_client_sync.py. Council perspective P3 (Skeptic) had this take after audit-gate blocked the aggregate-only path — 3 cloud routes (analytics.py, activity.py, corrections.py) read raw events directly, so telemetry-only would have flatlined them.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 Walkthrough
WalkthroughAdded event batch synchronization to ChangesEvent Batch Synchronization
Sequence DiagramsequenceDiagram
participant Script as Backfill Script
participant Client as CloudClient
participant FS as FileSystem
participant Server as Cloud Server
Script->>Client: sync(batch_size=500)
Client->>FS: read .gradata-sync-state.json
FS-->>Client: {last_sync_at, last_event_id}
Client->>FS: read events.jsonl
FS-->>Client: [event, event, ...]
Client->>Client: filter events by last_sync_at
Client->>Client: partition into batch_size chunks
loop For each batch
Client->>Client: format_event(ev) with deterministic event_id
Client->>Server: POST /sync [batch]
alt HTTP 413 (Too Large)
Server-->>Client: 413
Client->>Client: batch_size = batch_size / 2
Client->>Server: POST /sync [smaller batch]
end
Server-->>Client: 200 OK
Client->>Client: advance last_sync_at to newest event ts
Client->>FS: write .gradata-sync-state.json
FS-->>Client: ✓
end
Client-->>Script: ingested_count
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
Replaces #161 (which had 43 unrelated commits diverged from main, unmergeable).
What
client.sync()reads events.jsonl, filters by last_sync_at watermark, batches 500/req, advances cursor on 200, retries smaller batch on 413<BRAIN_DIR>/.gradata-sync-state.json(events.jsonl untouched, append-only)scripts/backfill_to_cloud.py— one-shot historical replay for the ~5,800 events the broken sync silently droppedTests
Pairs with
/syncingests events with watermark