feat(retrieval-anon): anon piece selection and retrieval by dennis-tra · Pull Request #459 · FilOzone/dealbot

dennis-tra · 2026-04-21T12:55:46Z

Hi folks,

Note

This PR has grown quite large in terms of change lines (>10k) but many of them are the JSON ABIs for the contracts (~3.7k LOC) and test files. Nevertheless, there is still a lot of new logic.

This pull request adds the anonymous retrieval functionality discussed in #427 . In that issue we agreed that it's sufficient to implement a random piece selection and perform a retrieval for it (as opposed to always query the data that dealbot uploaded).

This is implemented as follows:

We define a new subgraph which I've taken from this PR feat: fwss dataset/piece index enrichment pdp-explorer#100. I've removed everything from the pdp-explorer subgraph that's not relevant to dealbot and added another listener for FWSS events. This resulted in an 70% code reduction (rough guess). It also means that filoz need to deploy their own subgraphs. This is a two-command operation and very simple. I'm not sure how much that will cost though.
There is a new anon-retrieval concept that's separate from the basic retrieval. Initially I tried to bolt it on top of the original retrieval logic but eventually it didn't feel right.

Regarding the anonymous piece selection logic:

The retrievalAnon check probes an SP for non-dealbot pieces so we can detect SPs that behave well even if the teacher is not watching. To do this fairly, the piece selection should satisfy the following requirements:

uniform random across the SP's entire active pieces (not biased toward recent writes, specific payers, or specific sizes).
Prefer withIPFSIndexing pieces (so CAR/IPNI validation has something to check) but still exercise non-indexed pieces so an SP can't optimise only its CAR corpus.
Cover a realistic spread of piece sizes: big enough for useful bandwidth measurements, not so big that SPs with only small deals are skipped.
Avoid immediately re-testing the same piece across consecutive checks.

How it works in practice:

Every Root entity in the subgraph carries a sampleKey = keccak256(setId-rootId) populated once at insert time. Because keccak256 is uniform over 256 bits and independent of creation order/size/dataset,
sampleKey sorts roots into a uniform random permutation that is stable across queries.

This is necessary because you cannot just select a random element from a range query in GraphQL. If we knew the total number of pieces we could define a random skip value but this is also capped at 5000. I've read that it becomes very inefficient at higher values. This would also require a non-trivial bookkeeping of active pieces/datasets counts. The sampleKey is much easier.

Drawing a sample looks like this:

Pick a size bucket (small < 20 MiB, medium 20 MiB to 100 GiB, large 100 MiB to 500 MiB) by weighted random — weights 20% / 50% / 30% respectively.
Pick the pool: withIPFSIndexing: true with probability 80%; otherwise no filter.
Generate 32 random bytes as $sampleKey and query:

query randomPiece {
  roots( // <- piece
    first: 1
    orderBy: $sampleKey
    orderDirection: asc
    where: {
      sampleKey_gte: $sampleKey
      removed: false
      rawSize_gte: $sizeBucket_lo
      rawSize_lte: $sizeBucket_hi
      proofSet_: { // <- dataset
        fwssServiceProvider: $sp
        fwssPayer_not: $dealbotPayer
        isActive: true
        withIPFSIndexing: $pool
      }
    }
  )
}

This returns the root with the smallest sampleKey >= $sampleKey which is effectively a uniform random pick, in O(log N).
Drop it if pdpPaymentEndEpoch has already passed the latest indexed block, or if its CID appears in the last 500 anonymous retrievals (so we don't sample the same block twice in fast succession). On a miss, redraws once with a fresh $sampleKey.
Falls back through: (same bucket, opposite pool) -> (any bucket, indexed) -> (any bucket, any) before giving up.

Subgraph

I have deployed the new subgraphs:

A deployment looks like this from within the subgraph folder (prerequisite is a call to goldsky login):

pnpm run codegen && pnpm run build:calibration && VERSION=0.3.0 pnpm run deploy:calibration
pnpm run codegen && pnpm run build:mainnet && VERSION=0.3.0 pnpm run deploy:mainnet

Things to be aware of

timeout handling was a bit tricky because we have 1) job timeouts 2) a connect timeout 3) a transfer timeout. Connect and transfer timeouts were shared between the basic and anon retrievals but because anon retrievals may download larger files they were too short. I've configured a job timeout for anon retrieval to 5 minutes (which should actually also take the job-rate into account but doesn't at the moment) and the http transfer timeout is set to the maximum job timeout value of the basic and anon retrievals. That's because both code paths use the same HTTP client.
If an http2 retrieval times out but has received partial data, it returns partial information (ttfb, retrieved bytes, etc). I've only added this to http2.

Transparency: Claude helped; especially with the earlier commits.

Imports the goldsky subgraph mappings from FilOzone/pdp-explorer#100 as an in-tree package. This is the subgraph dealbot will own and deploy for itself (motivated by dealbot#427 anonymous retrieval check). Integrated with pnpm workspace, parameterized over networks.json for mainnet (filecoin) and calibration (filecoin-testnet), and pinned assemblyscript@0.19.23 so matchstick-as@0.6.0 picks up its binary. Biome and root test/build scripts intentionally skip this package — it is AssemblyScript compiled to WASM via graph-cli, and its lifecycle is "rebuild and redeploy to Goldsky", not per-PR. Schema, handlers, and tests are currently the unmodified upstream pdp-explorer content; subsequent commits will trim them to the three queries dealbot actually uses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Delete 12 entities that dealbot never queries: Service, ServiceProviderLink, EventLog, Transaction, FaultRecord, ProvingWindow, SumTreeCount, NetworkMetric, and the Weekly/MonthlyProviderActivity + Weekly/MonthlyProofSetActivity rollups. Trim Provider, DataSet, and Root to the fields backing the three backend queries in apps/backend/src/pdp-subgraph/queries.ts (GET_SUBGRAPH_META, GET_PROVIDERS_WITH_DATASETS, GET_FWSS_CANDIDATE_PIECES). graph codegen passes. graph build is intentionally broken by this commit — the next commit prunes handlers that reference deleted fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cut ~3000 LOC of handler code that wrote to deleted entities: - Delete helper.ts, sumTree.ts, pdp-service.ts (latter was never wired in subgraph.yaml in the first place). - Delete abis/PDPService.json (unreferenced). - Rewrite pdp-verifier.ts from 1603 → 233 LOC. Drop EventLog, Transaction, Service, ServiceProviderLink bookkeeping and the weekly/monthly metrics rollups. Delete handleProofFeePaid entirely (fee not queried). Keep handlePossessionProven as a one-line flag flipper — needed because handleNextProvingPeriod uses provenThisPeriod to classify skipped periods as faults. - Trim fwss.ts to populate only the six FWSS fields still on the schema (fwssPayer, fwssServiceProvider, withIPFSIndexing, pdpPaymentEndEpoch, plus ipfsRootCID on Root). - Add provenThisPeriod back to DataSet schema — internal signal for the faultedPeriods computation; not directly queried. - Drop ProofFeePaid binding and unused entity names from manifest. - Shrink utils/index.ts to just MaxProvingPeriod. graph codegen and graph build pass. graph test fails on cases that still reference deleted fields — task-04 rewrites tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Delete fault-calculation.test.ts (962 LOC of SumTree + per-piece fault math that's gone). - Gut pdp-verifier.test.ts down to a single DataSet+Provider+Root sanity test. - Rewrite dataset-status.test.ts as a focused EMPTY→READY→PROVING→ EMPTY→READY→PROVING lifecycle run plus per-transition checks. - Rewrite fwss.test.ts to drop assertions on deleted fields (fwssProviderId, metadataKeys/Values, withCDN, fwssPdpRailId) and add an end-to-end test that exercises the exact shape backing GET_FWSS_CANDIDATE_PIECES. - Add .github/workflows/subgraph.yml: build both networks, restore mainnet manifest, run matchstick. Runs only on apps/subgraph/** changes. - Update .env.example and docs/environment-variables.md to point PDP_SUBGRAPH_ENDPOINT at the dealbot-owned Goldsky slots. All 22 matchstick tests pass. Goldsky deploy remains a follow-up requiring credentials (see apps/subgraph/README.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dennis-tra · 2026-04-23T12:58:57Z

I've just ran it agains calibration and these are some early results that end up in the DB:

dealbot=# select * from anon_retrievals;
-[ RECORD 1 ]------+--------------------------------------------------------------------------------------------------------------------
id                 | 2937b0ef-60dc-4ffa-a388-b58a54dafc11
sp_address         | 0xCb9e86945cA31E6C3120725BF0385CBAD684040c
piece_cid          | bafkzcibf2kt3qbiukjjxitn2wvbkya27nkgztrdol4sm22ylafavv4nfauppnm66wetq
data_set_id        | 12918
piece_id           | 1526
raw_size           | 21883950
with_ipfs_indexing | t
ipfs_root_cid      |
service_type       | direct_sp
retrieval_endpoint | https://caliberation-pdp.infrafolio.com/piece/bafkzcibf2kt3qbiukjjxitn2wvbkya27nkgztrdol4sm22ylafavv4nfauppnm66wetq
status             | success
started_at         | 2026-04-23 12:53:36.112+00
completed_at       | 2026-04-23 12:53:54.385+00
latency_ms         | 17075
ttfb_ms            | 628
throughput_bps     | 1281637
bytes_retrieved    | 21883950
response_code      | 200
error_message      |
commp_valid        | t
car_valid          |
created_at         | 2026-04-23 12:53:54.386855+00
updated_at         | 2026-04-23 12:53:54.386855+00
-[ RECORD 2 ]------+--------------------------------------------------------------------------------------------------------------------
id                 | c7aa4bb1-6fcf-47c5-99db-23da487f1650
sp_address         | 0xbCdf1bdc1a97D071a5a8EF03F1F05225b6E2a1Ba
piece_cid          | bafkzcibfyxd66biupryffeu725wa7o3lnxtd4o5hyfy3wwzvtiudys74dcy3x7ik24fa
data_set_id        | 12916
piece_id           | 268
raw_size           | 20978747
with_ipfs_indexing | t
ipfs_root_cid      |
service_type       | direct_sp
retrieval_endpoint | https://calib2.ezpdpz.net/piece/bafkzcibfyxd66biupryffeu725wa7o3lnxtd4o5hyfy3wwzvtiudys74dcy3x7ik24fa
status             | success
started_at         | 2026-04-23 12:53:36.043+00
completed_at       | 2026-04-23 12:53:59.254+00
latency_ms         | 22298
ttfb_ms            | 307
throughput_bps     | 940835
bytes_retrieved    | 20978747
response_code      | 200
error_message      |
commp_valid        | t
car_valid          |
created_at         | 2026-04-23 12:53:59.258191+00
updated_at         | 2026-04-23 12:53:59.258191+00

The two anon-piece sampling queries differed only by the presence of withIPFSIndexing: true in the nested proofSet filter. Replace them with a single buildSampleAnonPieceQuery(pool) function that emits the shared shape and drops the filter when pool === "any".

SgtPooki · 2026-04-23T13:50:20Z

@dennis-tra can we please break out the subgraph into a separate PR? 10k is.. unreviewable =/

dennis-tra · 2026-04-29T12:40:05Z

replaced by #487

FilOzzy added this to FOC Apr 21, 2026

github-project-automation Bot moved this to 📌 Triage in FOC Apr 21, 2026

dennis-tra and others added 8 commits April 22, 2026 18:25

feat: anon piece selection and retrieval

ed2e041

refactor(subgraph): rename calibnet to calibration

256e0a4

refactor(subgraph): consolidate helper methods

300b5c9

rename: pdp-subgraph to just subgraph

713bd96

dennis-tra force-pushed the retrieval-anon branch 3 times, most recently from 87614af to 86c300a Compare April 23, 2026 09:45

dennis-tra added 6 commits April 23, 2026 11:46

refactor(retrieval-anon): random piece selection

55b9187

fix: failing tests from rebasing

743ec17

fix(ci): pnpm script handling

a673613

refactor: pull request self review

198174e

change: decrease size bucket limits

b214a03

refactor(retrieval-anon): use dedicated anonymous retrieval table

ee65c7e

dennis-tra force-pushed the retrieval-anon branch from 86c300a to ee65c7e Compare April 23, 2026 09:46

dennis-tra added 4 commits April 23, 2026 12:05

add(retrieval-anon): raw size column

744b3a4

add(retrieval-anon): job timeout configuration

3791f4c

fix(retrieval-anon): track partial retrieval data

8890c70

fix: don't fold connect and transfer timeout signals

06320f7

dennis-tra force-pushed the retrieval-anon branch from 0fc999c to 06320f7 Compare April 23, 2026 12:52

dennis-tra marked this pull request as ready for review April 23, 2026 12:55

dennis-tra mentioned this pull request Apr 23, 2026

New check: retrieval++ #427

Open

dennis-tra mentioned this pull request Apr 23, 2026

feat: dealbot-specific subgraph #469

Merged

dennis-tra marked this pull request as draft April 23, 2026 14:00

BigLep assigned dennis-tra Apr 24, 2026

BigLep moved this from 📌 Triage to ⌨️ In Progress in FOC Apr 24, 2026

iand mentioned this pull request Apr 27, 2026

feat: write event data to clickhouse #438

Merged

dennis-tra closed this Apr 29, 2026

github-project-automation Bot moved this from ⌨️ In Progress to 🎉 Done in FOC Apr 29, 2026

dennis-tra mentioned this pull request Apr 29, 2026

feat: anon piece selection and retrieval #487

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retrieval-anon): anon piece selection and retrieval#459

feat(retrieval-anon): anon piece selection and retrieval#459
dennis-tra wants to merge 19 commits into
FilOzone:mainfrom
probe-lab:retrieval-anon

dennis-tra commented Apr 21, 2026 •

edited

Loading

Uh oh!

dennis-tra commented Apr 23, 2026

Uh oh!

SgtPooki commented Apr 23, 2026

Uh oh!

dennis-tra commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dennis-tra commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Subgraph

Things to be aware of

Uh oh!

dennis-tra commented Apr 23, 2026

Uh oh!

SgtPooki commented Apr 23, 2026

Uh oh!

dennis-tra commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dennis-tra commented Apr 21, 2026 •

edited

Loading