feat: write event data to clickhouse #438
Conversation
ae0768d to
2197aaf
Compare
cd6ce7b to
8777da9
Compare
There was a problem hiding this comment.
Pull request overview
Adds a ClickHouse integration to the backend to emit high-dimensional analytical “check” events (deal storage, retrieval, and data retention) into queryable tables, plus local Kind resources to run ClickHouse in development.
Changes:
- Added a global NestJS
ClickhouseModule/ClickhouseServicewith buffered inserts, periodic flushing, and Prometheus metrics. - Emitted ClickHouse rows from
DealService,RetrievalService, andDataRetentionService(with corresponding test wiring updates). - Added local kustomize overlay resources + Make targets to deploy/reset/shell into ClickHouse.
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Locks @clickhouse/client dependency additions. |
| Makefile | Adds clickhouse-reset and clickhouse-shell dev targets. |
| kustomize/overlays/local/kustomization.yaml | Includes the new local ClickHouse overlay. |
| kustomize/overlays/local/clickhouse/service.yaml | Exposes ClickHouse ports (8123/9000) in-cluster. |
| kustomize/overlays/local/clickhouse/pvc.yaml | Persists ClickHouse data locally via PVC. |
| kustomize/overlays/local/clickhouse/kustomization.yaml | Wires ClickHouse overlay resources together. |
| kustomize/overlays/local/clickhouse/initdb-configmap.yaml | Initializes the dealbot ClickHouse database in local dev. |
| kustomize/overlays/local/clickhouse/deployment.yaml | Deploys ClickHouse server in the local overlay. |
| kustomize/overlays/local/backend-resources-local.yaml | Sets local backend resource requests/limits. |
| kustomize/overlays/local/backend-configmap-local.yaml | Enables ClickHouse in local overlay and sets probe location. |
| apps/backend/src/worker.module.ts | Imports ClickhouseModule into the worker app. |
| apps/backend/src/retrieval/retrieval.service.ts | Inserts retrieval check rows into ClickHouse. |
| apps/backend/src/retrieval/retrieval.service.spec.ts | Mocks ClickhouseService and config for updated DI. |
| apps/backend/src/deal/deal.service.ts | Inserts storage check rows into ClickHouse; adjusts pieceId assignment logic. |
| apps/backend/src/deal/deal.service.spec.ts | Mocks ClickhouseService for updated DI. |
| apps/backend/src/data-retention/data-retention.service.ts | Inserts retention challenge rows into ClickHouse. |
| apps/backend/src/data-retention/data-retention.service.spec.ts | Adds a ClickhouseService mock to constructor wiring. |
| apps/backend/src/config/app.config.ts | Adds ClickHouse env vars and probeLocation to app config. |
| apps/backend/src/clickhouse/clickhouse.service.ts | Implements ClickHouse client lifecycle, buffering, flushing, and metrics. |
| apps/backend/src/clickhouse/clickhouse.schema.ts | Defines ClickHouse DDL for the three analytical tables. |
| apps/backend/src/clickhouse/clickhouse.module.ts | Registers Prometheus metrics + service as a global module. |
| apps/backend/src/clickhouse/clickhouse.config.ts | Reads ClickHouse config from environment. |
| apps/backend/src/app.module.ts | Imports ClickhouseModule into the API app. |
| apps/backend/package.json | Adds @clickhouse/client dependency. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I will look at the copilot issues on Monday |
SgtPooki
left a comment
There was a problem hiding this comment.
a few changes requested, a lot of general comments and responses to copilot things, I want to run the local cluster and see how things fare.
|
ran this locally and it seems to be working fine, but I would like to see the try/catch addressed. I might get to this before you do, we'll see.. I did add #443 to address some errors i saw during startup, but not blocking at all |
BigLep
left a comment
There was a problem hiding this comment.
A few comments when looking at the schema. Will leave to engineers to give the approval on the PR.
|
I think I have resolved all the straightforward issues.
There were some review comments by copilot that were erroneous, probably due to clickhouse specifics:
Some comments that need more discussion/response:
|
aae939b to
0534a17
Compare
|
@SgtPooki I've hopefully addressed your latest comments |
SgtPooki
left a comment
There was a problem hiding this comment.
@iand the only thing left is to move forward with your schema updates to get you unblocked (see #438 (comment)), and also the retry_count can likely be dropped (see #438 (comment))
|
All changes from review comments are addressed now |
|
Is there anything remaining blocking this from being approved and merged @SgtPooki ? |
|
Actually, one change we could make to the retrieval_type LowCardinality(String), -- 'deal' | 'sample'The two values are:
Not I used |
* chore: add Copilot PR review instructions Adds .github/copilot-instructions.md to guide GitHub Copilot's PR review behavior toward high-signal feedback and away from CI-duplicate noise. Process: - Reviewed Copilot's review-platform constraints (4000-char base-branch read, Comment-only review, no merge gating, no external link following) plus Google/Microsoft/OWASP/NIST review literature. - Analyzed 319 Copilot inline comments across the last 150 dealbot PRs to identify which areas Copilot reviews well (job-state consistency, test/fixture-contract drift, multi-network behavior, quoted SQL identifiers, redaction) versus where it overreaches (generic-SQL assumptions on ClickHouse code in PRs #438 and #485, low-priority frontend optimization comments). - Iterated through rounds of adversarial review (self-review against the evidence, then a second-opinion review by Codex) to tighten wording, fit the 4000-byte budget, and encode dealbot-specific invariants. Encoded: - Repository context: monorepo layout, Postgres = source of truth, ClickHouse cluster/schema/migrations owned by an external team (dealbot reviews payload correctness and operational impact, not schema/retention/ops design). - Core invariants: at most one job per SP per check type per network; jobs fail only on execution failure, not on negative check results; scheduling/cleanup/filtering/queue execution stay consistent across the same SP set and network. - Blocker/Important priorities aligned to observed high-value comment themes. - Do-Not-Comment list to suppress CI-duplicate noise (Biome, build, typecheck, test, Docker already enforced in CI). Final size: 3890/4000 bytes. * chore: address Copilot feedback on review instructions - Clarify ClickHouse ownership: DDL (clickhouse.schema.ts) and event payloads are owned in-repo and in-scope for review; only cluster ops/retention/infra tuning are externally owned. Earlier wording could have suppressed legitimate schema review (Copilot caught this). - Add Prometheus as a source of truth alongside Postgres, and discourage adding new persisted DB state without need. - Align Comment Format header with the Blocker/Important priority scheme instead of a free-form `Severity:` label. - Drop the generic performance Important bullet; not backed by the PR-comment evidence and frees bytes for the above. Final size: 3950/4000 bytes.
Adds a ClickHouse integration to the backend that captures individual deal, retrieval, and data retention observations as queryable analytical records
Three tables are written to: data_storage_checks, retrieval_checks, and data_retention_challenges. Writes are buffered and flushed in the background; failures are logged and swallowed so ClickHouse issues never affect job execution.
When CLICKHOUSE_URL is unset the integration is a no-op, so the app works unchanged without it configured. Tables are created automatically on startup. A ClickHouse instance is included in the local Kind cluster for development.
Four Prometheus metrics expose ClickHouse health: flush duration, error count, buffer depth, rows inserted.
For #426