Skip to content

feat(retrieval-anon): anon piece selection and retrieval#459

Closed
dennis-tra wants to merge 19 commits into
FilOzone:mainfrom
probe-lab:retrieval-anon
Closed

feat(retrieval-anon): anon piece selection and retrieval#459
dennis-tra wants to merge 19 commits into
FilOzone:mainfrom
probe-lab:retrieval-anon

Conversation

@dennis-tra
Copy link
Copy Markdown
Contributor

@dennis-tra dennis-tra commented Apr 21, 2026

Hi folks,

Note

This PR has grown quite large in terms of change lines (>10k) but many of them are the JSON ABIs for the contracts (~3.7k LOC) and test files. Nevertheless, there is still a lot of new logic.

This pull request adds the anonymous retrieval functionality discussed in #427 . In that issue we agreed that it's sufficient to implement a random piece selection and perform a retrieval for it (as opposed to always query the data that dealbot uploaded).

This is implemented as follows:

  • We define a new subgraph which I've taken from this PR feat: fwss dataset/piece index enrichment pdp-explorer#100. I've removed everything from the pdp-explorer subgraph that's not relevant to dealbot and added another listener for FWSS events. This resulted in an 70% code reduction (rough guess). It also means that filoz need to deploy their own subgraphs. This is a two-command operation and very simple. I'm not sure how much that will cost though.
  • There is a new anon-retrieval concept that's separate from the basic retrieval. Initially I tried to bolt it on top of the original retrieval logic but eventually it didn't feel right.

Regarding the anonymous piece selection logic:

The retrievalAnon check probes an SP for non-dealbot pieces so we can detect SPs that behave well even if the teacher is not watching. To do this fairly, the piece selection should satisfy the following requirements:

  1. uniform random across the SP's entire active pieces (not biased toward recent writes, specific payers, or specific sizes).
  2. Prefer withIPFSIndexing pieces (so CAR/IPNI validation has something to check) but still exercise non-indexed pieces so an SP can't optimise only its CAR corpus.
  3. Cover a realistic spread of piece sizes: big enough for useful bandwidth measurements, not so big that SPs with only small deals are skipped.
  4. Avoid immediately re-testing the same piece across consecutive checks.

How it works in practice:

Every Root entity in the subgraph carries a sampleKey = keccak256(setId-rootId) populated once at insert time. Because keccak256 is uniform over 256 bits and independent of creation order/size/dataset,
sampleKey sorts roots into a uniform random permutation that is stable across queries.

This is necessary because you cannot just select a random element from a range query in GraphQL. If we knew the total number of pieces we could define a random skip value but this is also capped at 5000. I've read that it becomes very inefficient at higher values. This would also require a non-trivial bookkeeping of active pieces/datasets counts. The sampleKey is much easier.

Drawing a sample looks like this:

  1. Pick a size bucket (small < 20 MiB, medium 20 MiB to 100 GiB, large 100 MiB to 500 MiB) by weighted random — weights 20% / 50% / 30% respectively.
  2. Pick the pool: withIPFSIndexing: true with probability 80%; otherwise no filter.
  3. Generate 32 random bytes as $sampleKey and query:
query randomPiece {
  roots( // <- piece
    first: 1
    orderBy: $sampleKey
    orderDirection: asc
    where: {
      sampleKey_gte: $sampleKey
      removed: false
      rawSize_gte: $sizeBucket_lo
      rawSize_lte: $sizeBucket_hi
      proofSet_: { // <- dataset
        fwssServiceProvider: $sp
        fwssPayer_not: $dealbotPayer
        isActive: true
        withIPFSIndexing: $pool
      }
    }
  )
}
  1. This returns the root with the smallest sampleKey >= $sampleKey which is effectively a uniform random pick, in O(log N).
  2. Drop it if pdpPaymentEndEpoch has already passed the latest indexed block, or if its CID appears in the last 500 anonymous retrievals (so we don't sample the same block twice in fast succession). On a miss, redraws once with a fresh $sampleKey.
  3. Falls back through: (same bucket, opposite pool) -> (any bucket, indexed) -> (any bucket, any) before giving up.

Subgraph

I have deployed the new subgraphs:

A deployment looks like this from within the subgraph folder (prerequisite is a call to goldsky login):

pnpm run codegen && pnpm run build:calibration && VERSION=0.3.0 pnpm run deploy:calibration
pnpm run codegen && pnpm run build:mainnet && VERSION=0.3.0 pnpm run deploy:mainnet

Things to be aware of

  • timeout handling was a bit tricky because we have 1) job timeouts 2) a connect timeout 3) a transfer timeout. Connect and transfer timeouts were shared between the basic and anon retrievals but because anon retrievals may download larger files they were too short. I've configured a job timeout for anon retrieval to 5 minutes (which should actually also take the job-rate into account but doesn't at the moment) and the http transfer timeout is set to the maximum job timeout value of the basic and anon retrievals. That's because both code paths use the same HTTP client.
  • If an http2 retrieval times out but has received partial data, it returns partial information (ttfb, retrieved bytes, etc). I've only added this to http2.

Transparency: Claude helped; especially with the earlier commits.

@FilOzzy FilOzzy added this to FOC Apr 21, 2026
@github-project-automation github-project-automation Bot moved this to 📌 Triage in FOC Apr 21, 2026
dennis-tra and others added 8 commits April 22, 2026 18:25
Imports the goldsky subgraph mappings from FilOzone/pdp-explorer#100 as
an in-tree package. This is the subgraph dealbot will own and deploy for
itself (motivated by dealbot#427 anonymous retrieval check).

Integrated with pnpm workspace, parameterized over networks.json for
mainnet (filecoin) and calibration (filecoin-testnet), and pinned
assemblyscript@0.19.23 so matchstick-as@0.6.0 picks up its binary.

Biome and root test/build scripts intentionally skip this package — it
is AssemblyScript compiled to WASM via graph-cli, and its lifecycle is
"rebuild and redeploy to Goldsky", not per-PR.

Schema, handlers, and tests are currently the unmodified upstream
pdp-explorer content; subsequent commits will trim them to the three
queries dealbot actually uses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Delete 12 entities that dealbot never queries: Service,
ServiceProviderLink, EventLog, Transaction, FaultRecord, ProvingWindow,
SumTreeCount, NetworkMetric, and the Weekly/MonthlyProviderActivity +
Weekly/MonthlyProofSetActivity rollups.

Trim Provider, DataSet, and Root to the fields backing the three
backend queries in apps/backend/src/pdp-subgraph/queries.ts
(GET_SUBGRAPH_META, GET_PROVIDERS_WITH_DATASETS,
GET_FWSS_CANDIDATE_PIECES).

graph codegen passes. graph build is intentionally broken by this
commit — the next commit prunes handlers that reference deleted fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cut ~3000 LOC of handler code that wrote to deleted entities:

- Delete helper.ts, sumTree.ts, pdp-service.ts (latter was never
  wired in subgraph.yaml in the first place).
- Delete abis/PDPService.json (unreferenced).
- Rewrite pdp-verifier.ts from 1603 → 233 LOC. Drop EventLog,
  Transaction, Service, ServiceProviderLink bookkeeping and the
  weekly/monthly metrics rollups. Delete handleProofFeePaid
  entirely (fee not queried). Keep handlePossessionProven as a
  one-line flag flipper — needed because handleNextProvingPeriod
  uses provenThisPeriod to classify skipped periods as faults.
- Trim fwss.ts to populate only the six FWSS fields still on the
  schema (fwssPayer, fwssServiceProvider, withIPFSIndexing,
  pdpPaymentEndEpoch, plus ipfsRootCID on Root).
- Add provenThisPeriod back to DataSet schema — internal signal
  for the faultedPeriods computation; not directly queried.
- Drop ProofFeePaid binding and unused entity names from manifest.
- Shrink utils/index.ts to just MaxProvingPeriod.

graph codegen and graph build pass. graph test fails on cases that
still reference deleted fields — task-04 rewrites tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Delete fault-calculation.test.ts (962 LOC of SumTree + per-piece
  fault math that's gone).
- Gut pdp-verifier.test.ts down to a single DataSet+Provider+Root
  sanity test.
- Rewrite dataset-status.test.ts as a focused EMPTY→READY→PROVING→
  EMPTY→READY→PROVING lifecycle run plus per-transition checks.
- Rewrite fwss.test.ts to drop assertions on deleted fields
  (fwssProviderId, metadataKeys/Values, withCDN, fwssPdpRailId) and
  add an end-to-end test that exercises the exact shape backing
  GET_FWSS_CANDIDATE_PIECES.
- Add .github/workflows/subgraph.yml: build both networks, restore
  mainnet manifest, run matchstick. Runs only on apps/subgraph/**
  changes.
- Update .env.example and docs/environment-variables.md to point
  PDP_SUBGRAPH_ENDPOINT at the dealbot-owned Goldsky slots.

All 22 matchstick tests pass. Goldsky deploy remains a follow-up
requiring credentials (see apps/subgraph/README.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dennis-tra dennis-tra force-pushed the retrieval-anon branch 3 times, most recently from 87614af to 86c300a Compare April 23, 2026 09:45
@dennis-tra dennis-tra marked this pull request as ready for review April 23, 2026 12:55
@dennis-tra
Copy link
Copy Markdown
Contributor Author

I've just ran it agains calibration and these are some early results that end up in the DB:

dealbot=# select * from anon_retrievals;
-[ RECORD 1 ]------+--------------------------------------------------------------------------------------------------------------------
id                 | 2937b0ef-60dc-4ffa-a388-b58a54dafc11
sp_address         | 0xCb9e86945cA31E6C3120725BF0385CBAD684040c
piece_cid          | bafkzcibf2kt3qbiukjjxitn2wvbkya27nkgztrdol4sm22ylafavv4nfauppnm66wetq
data_set_id        | 12918
piece_id           | 1526
raw_size           | 21883950
with_ipfs_indexing | t
ipfs_root_cid      |
service_type       | direct_sp
retrieval_endpoint | https://caliberation-pdp.infrafolio.com/piece/bafkzcibf2kt3qbiukjjxitn2wvbkya27nkgztrdol4sm22ylafavv4nfauppnm66wetq
status             | success
started_at         | 2026-04-23 12:53:36.112+00
completed_at       | 2026-04-23 12:53:54.385+00
latency_ms         | 17075
ttfb_ms            | 628
throughput_bps     | 1281637
bytes_retrieved    | 21883950
response_code      | 200
error_message      |
commp_valid        | t
car_valid          |
created_at         | 2026-04-23 12:53:54.386855+00
updated_at         | 2026-04-23 12:53:54.386855+00
-[ RECORD 2 ]------+--------------------------------------------------------------------------------------------------------------------
id                 | c7aa4bb1-6fcf-47c5-99db-23da487f1650
sp_address         | 0xbCdf1bdc1a97D071a5a8EF03F1F05225b6E2a1Ba
piece_cid          | bafkzcibfyxd66biupryffeu725wa7o3lnxtd4o5hyfy3wwzvtiudys74dcy3x7ik24fa
data_set_id        | 12916
piece_id           | 268
raw_size           | 20978747
with_ipfs_indexing | t
ipfs_root_cid      |
service_type       | direct_sp
retrieval_endpoint | https://calib2.ezpdpz.net/piece/bafkzcibfyxd66biupryffeu725wa7o3lnxtd4o5hyfy3wwzvtiudys74dcy3x7ik24fa
status             | success
started_at         | 2026-04-23 12:53:36.043+00
completed_at       | 2026-04-23 12:53:59.254+00
latency_ms         | 22298
ttfb_ms            | 307
throughput_bps     | 940835
bytes_retrieved    | 20978747
response_code      | 200
error_message      |
commp_valid        | t
car_valid          |
created_at         | 2026-04-23 12:53:59.258191+00
updated_at         | 2026-04-23 12:53:59.258191+00

The two anon-piece sampling queries differed only by the presence of
withIPFSIndexing: true in the nested proofSet filter. Replace them with
a single buildSampleAnonPieceQuery(pool) function that emits the shared
shape and drops the filter when pool === "any".
@SgtPooki
Copy link
Copy Markdown
Collaborator

@dennis-tra can we please break out the subgraph into a separate PR? 10k is.. unreviewable =/

@dennis-tra dennis-tra marked this pull request as draft April 23, 2026 14:00
@BigLep BigLep moved this from 📌 Triage to ⌨️ In Progress in FOC Apr 24, 2026
@dennis-tra dennis-tra closed this Apr 29, 2026
@github-project-automation github-project-automation Bot moved this from ⌨️ In Progress to 🎉 Done in FOC Apr 29, 2026
@dennis-tra
Copy link
Copy Markdown
Contributor Author

replaced by #487

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

4 participants