Skip to content

feat: write event data to clickhouse #438

Merged
SgtPooki merged 19 commits into
FilOzone:mainfrom
probe-lab:clickhouse-data-layer
Apr 27, 2026
Merged

feat: write event data to clickhouse #438
SgtPooki merged 19 commits into
FilOzone:mainfrom
probe-lab:clickhouse-data-layer

Conversation

@iand
Copy link
Copy Markdown
Contributor

@iand iand commented Apr 8, 2026

Adds a ClickHouse integration to the backend that captures individual deal, retrieval, and data retention observations as queryable analytical records

Three tables are written to: data_storage_checks, retrieval_checks, and data_retention_challenges. Writes are buffered and flushed in the background; failures are logged and swallowed so ClickHouse issues never affect job execution.

When CLICKHOUSE_URL is unset the integration is a no-op, so the app works unchanged without it configured. Tables are created automatically on startup. A ClickHouse instance is included in the local Kind cluster for development.

Four Prometheus metrics expose ClickHouse health: flush duration, error count, buffer depth, rows inserted.

For #426

@FilOzzy FilOzzy added this to FOC Apr 8, 2026
@github-project-automation github-project-automation Bot moved this to 📌 Triage in FOC Apr 8, 2026
@iand iand changed the title Clickhouse data layer feat: write event data to clickhouse Apr 8, 2026
@iand iand force-pushed the clickhouse-data-layer branch 4 times, most recently from ae0768d to 2197aaf Compare April 9, 2026 10:41
@iand iand force-pushed the clickhouse-data-layer branch from cd6ce7b to 8777da9 Compare April 9, 2026 11:16
Comment thread apps/backend/src/clickhouse/clickhouse.config.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.schema.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts
@iand iand marked this pull request as ready for review April 9, 2026 13:33
Copilot AI review requested due to automatic review settings April 9, 2026 13:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a ClickHouse integration to the backend to emit high-dimensional analytical “check” events (deal storage, retrieval, and data retention) into queryable tables, plus local Kind resources to run ClickHouse in development.

Changes:

  • Added a global NestJS ClickhouseModule/ClickhouseService with buffered inserts, periodic flushing, and Prometheus metrics.
  • Emitted ClickHouse rows from DealService, RetrievalService, and DataRetentionService (with corresponding test wiring updates).
  • Added local kustomize overlay resources + Make targets to deploy/reset/shell into ClickHouse.

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pnpm-lock.yaml Locks @clickhouse/client dependency additions.
Makefile Adds clickhouse-reset and clickhouse-shell dev targets.
kustomize/overlays/local/kustomization.yaml Includes the new local ClickHouse overlay.
kustomize/overlays/local/clickhouse/service.yaml Exposes ClickHouse ports (8123/9000) in-cluster.
kustomize/overlays/local/clickhouse/pvc.yaml Persists ClickHouse data locally via PVC.
kustomize/overlays/local/clickhouse/kustomization.yaml Wires ClickHouse overlay resources together.
kustomize/overlays/local/clickhouse/initdb-configmap.yaml Initializes the dealbot ClickHouse database in local dev.
kustomize/overlays/local/clickhouse/deployment.yaml Deploys ClickHouse server in the local overlay.
kustomize/overlays/local/backend-resources-local.yaml Sets local backend resource requests/limits.
kustomize/overlays/local/backend-configmap-local.yaml Enables ClickHouse in local overlay and sets probe location.
apps/backend/src/worker.module.ts Imports ClickhouseModule into the worker app.
apps/backend/src/retrieval/retrieval.service.ts Inserts retrieval check rows into ClickHouse.
apps/backend/src/retrieval/retrieval.service.spec.ts Mocks ClickhouseService and config for updated DI.
apps/backend/src/deal/deal.service.ts Inserts storage check rows into ClickHouse; adjusts pieceId assignment logic.
apps/backend/src/deal/deal.service.spec.ts Mocks ClickhouseService for updated DI.
apps/backend/src/data-retention/data-retention.service.ts Inserts retention challenge rows into ClickHouse.
apps/backend/src/data-retention/data-retention.service.spec.ts Adds a ClickhouseService mock to constructor wiring.
apps/backend/src/config/app.config.ts Adds ClickHouse env vars and probeLocation to app config.
apps/backend/src/clickhouse/clickhouse.service.ts Implements ClickHouse client lifecycle, buffering, flushing, and metrics.
apps/backend/src/clickhouse/clickhouse.schema.ts Defines ClickHouse DDL for the three analytical tables.
apps/backend/src/clickhouse/clickhouse.module.ts Registers Prometheus metrics + service as a global module.
apps/backend/src/clickhouse/clickhouse.config.ts Reads ClickHouse config from environment.
apps/backend/src/app.module.ts Imports ClickhouseModule into the API app.
apps/backend/package.json Adds @clickhouse/client dependency.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apps/backend/src/clickhouse/clickhouse.service.ts
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts
Comment thread apps/backend/src/clickhouse/clickhouse.schema.ts
Comment thread apps/backend/src/clickhouse/clickhouse.schema.ts Outdated
Comment thread apps/backend/src/deal/deal.service.ts
Comment thread apps/backend/src/retrieval/retrieval.service.ts
Comment thread apps/backend/src/data-retention/data-retention.service.ts
@iand
Copy link
Copy Markdown
Contributor Author

iand commented Apr 9, 2026

I will look at the copilot issues on Monday

@BigLep BigLep moved this from 📌 Triage to 🔎 Awaiting review in FOC Apr 9, 2026
@BigLep BigLep requested review from BigLep and SgtPooki April 9, 2026 19:39
Copy link
Copy Markdown
Collaborator

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few changes requested, a lot of general comments and responses to copilot things, I want to run the local cluster and see how things fare.

Comment thread apps/backend/src/clickhouse/clickhouse.service.ts
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts
Comment thread apps/backend/src/data-retention/data-retention.service.ts
Comment thread apps/backend/src/data-retention/data-retention.service.ts
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.schema.ts
Comment thread apps/backend/src/clickhouse/clickhouse.config.ts Outdated
@github-project-automation github-project-automation Bot moved this from 🔎 Awaiting review to ⌨️ In Progress in FOC Apr 9, 2026
@SgtPooki
Copy link
Copy Markdown
Collaborator

ran this locally and it seems to be working fine, but I would like to see the try/catch addressed. I might get to this before you do, we'll see..

I did add #443 to address some errors i saw during startup, but not blocking at all

Copy link
Copy Markdown
Contributor

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments when looking at the schema. Will leave to engineers to give the approval on the PR.

Comment thread apps/backend/src/clickhouse/clickhouse.schema.ts
Comment thread apps/backend/src/clickhouse/clickhouse.schema.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.schema.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.schema.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.schema.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.schema.ts Outdated
@iand
Copy link
Copy Markdown
Contributor Author

iand commented Apr 13, 2026

I think I have resolved all the straightforward issues.

  • Added comments to the schema.
  • Renamed ttfb_ms column to first_byte_ms
  • Add sp_id columns to record the storage provider id
  • Made stylistic changes for guarding against undefined variables
  • Moved clickhouse config into app config for consistency

There were some review comments by copilot that were erroneous, probably due to clickhouse specifics:

  • Datetime(3) resolution - the 3 indicates millisecond resolution which is what we want
  • ORDER BY on tables - not needed in any recent version of clickhouse

Some comments that need more discussion/response:

  1. migrate() errors in onModuleInit - I added try/catch to log error then rethrow it. Are you ok with this approach @SgtPooki
  2. add rows back to byTable - I think it's ok as-is @SgtPooki
  3. retry_count - I left this in because it could be useful, but it's marginal
  4. data_retention_challenges schema - made a suggestion for schema change @BigLep

@iand iand force-pushed the clickhouse-data-layer branch from aae939b to 0534a17 Compare April 20, 2026 09:46
@SgtPooki SgtPooki self-requested a review April 21, 2026 19:46
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts Outdated
Comment thread apps/backend/src/clickhouse/clickhouse.service.ts
Comment thread apps/backend/src/data-retention/data-retention.service.ts
@iand
Copy link
Copy Markdown
Contributor Author

iand commented Apr 24, 2026

@SgtPooki I've hopefully addressed your latest comments

Copy link
Copy Markdown
Collaborator

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iand the only thing left is to move forward with your schema updates to get you unblocked (see #438 (comment)), and also the retry_count can likely be dropped (see #438 (comment))

@iand
Copy link
Copy Markdown
Contributor Author

iand commented Apr 24, 2026

Thanks @SgtPooki, I removed the retry_count columns as suggested.

Final decision is needed by @BigLep on the data_retention_challenges schema

@iand
Copy link
Copy Markdown
Contributor Author

iand commented Apr 24, 2026

All changes from review comments are addressed now

@iand
Copy link
Copy Markdown
Contributor Author

iand commented Apr 27, 2026

Is there anything remaining blocking this from being approved and merged @SgtPooki ?

@iand
Copy link
Copy Markdown
Contributor Author

iand commented Apr 27, 2026

Actually, one change we could make to the retrieval_checks table is to add a retrieval_type column

retrieval_type            LowCardinality(String),  -- 'deal' | 'sample'

The two values are:

  • deal - a retrieval triggered from a known deal id
  • sample - a retrieval triggered from sampling a random piece, in this case deal_id will be null

Not I used sample here rather than anonymous as used in #427 and #459 since I think it better conveys the concept, but we could align on anon if preferred

/cc @BigLep @dennis-tra @SgtPooki

@SgtPooki SgtPooki requested a review from BigLep April 27, 2026 12:59
@SgtPooki
Copy link
Copy Markdown
Collaborator

Not I used sample here rather than anonymous as used in #427 and #459 since I think it better conveys the concept, but we could align on anon if preferred

I'm good with either sample or anon, we just need to make sure we're using the same language.

@SgtPooki SgtPooki merged commit 6edbc88 into FilOzone:main Apr 27, 2026
7 checks passed
@github-project-automation github-project-automation Bot moved this from ⌨️ In Progress to 🎉 Done in FOC Apr 27, 2026
SgtPooki added a commit that referenced this pull request Apr 30, 2026
* chore: add Copilot PR review instructions

Adds .github/copilot-instructions.md to guide GitHub Copilot's PR review
behavior toward high-signal feedback and away from CI-duplicate noise.

Process:
- Reviewed Copilot's review-platform constraints (4000-char base-branch
  read, Comment-only review, no merge gating, no external link
  following) plus Google/Microsoft/OWASP/NIST review literature.
- Analyzed 319 Copilot inline comments across the last 150 dealbot PRs
  to identify which areas Copilot reviews well (job-state consistency,
  test/fixture-contract drift, multi-network behavior, quoted SQL
  identifiers, redaction) versus where it overreaches (generic-SQL
  assumptions on ClickHouse code in PRs #438 and #485, low-priority
  frontend optimization comments).
- Iterated through rounds of adversarial review (self-review against
  the evidence, then a second-opinion review by Codex) to tighten
  wording, fit the 4000-byte budget, and encode dealbot-specific
  invariants.

Encoded:
- Repository context: monorepo layout, Postgres = source of truth,
  ClickHouse cluster/schema/migrations owned by an external team
  (dealbot reviews payload correctness and operational impact, not
  schema/retention/ops design).
- Core invariants: at most one job per SP per check type per network;
  jobs fail only on execution failure, not on negative check results;
  scheduling/cleanup/filtering/queue execution stay consistent across
  the same SP set and network.
- Blocker/Important priorities aligned to observed high-value comment
  themes.
- Do-Not-Comment list to suppress CI-duplicate noise (Biome, build,
  typecheck, test, Docker already enforced in CI).

Final size: 3890/4000 bytes.

* chore: address Copilot feedback on review instructions

- Clarify ClickHouse ownership: DDL (clickhouse.schema.ts) and event
  payloads are owned in-repo and in-scope for review; only cluster
  ops/retention/infra tuning are externally owned. Earlier wording
  could have suppressed legitimate schema review (Copilot caught this).
- Add Prometheus as a source of truth alongside Postgres, and
  discourage adding new persisted DB state without need.
- Align Comment Format header with the Blocker/Important priority
  scheme instead of a free-form `Severity:` label.
- Drop the generic performance Important bullet; not backed by the
  PR-comment evidence and frees bytes for the above.

Final size: 3950/4000 bytes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

Provide hook to write high dimensionality data to a custom clickhouse isntance

7 participants