feat(retrieval-anon): anon piece selection and retrieval#459
Closed
dennis-tra wants to merge 19 commits into
Closed
feat(retrieval-anon): anon piece selection and retrieval#459dennis-tra wants to merge 19 commits into
dennis-tra wants to merge 19 commits into
Conversation
Imports the goldsky subgraph mappings from FilOzone/pdp-explorer#100 as an in-tree package. This is the subgraph dealbot will own and deploy for itself (motivated by dealbot#427 anonymous retrieval check). Integrated with pnpm workspace, parameterized over networks.json for mainnet (filecoin) and calibration (filecoin-testnet), and pinned assemblyscript@0.19.23 so matchstick-as@0.6.0 picks up its binary. Biome and root test/build scripts intentionally skip this package — it is AssemblyScript compiled to WASM via graph-cli, and its lifecycle is "rebuild and redeploy to Goldsky", not per-PR. Schema, handlers, and tests are currently the unmodified upstream pdp-explorer content; subsequent commits will trim them to the three queries dealbot actually uses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Delete 12 entities that dealbot never queries: Service, ServiceProviderLink, EventLog, Transaction, FaultRecord, ProvingWindow, SumTreeCount, NetworkMetric, and the Weekly/MonthlyProviderActivity + Weekly/MonthlyProofSetActivity rollups. Trim Provider, DataSet, and Root to the fields backing the three backend queries in apps/backend/src/pdp-subgraph/queries.ts (GET_SUBGRAPH_META, GET_PROVIDERS_WITH_DATASETS, GET_FWSS_CANDIDATE_PIECES). graph codegen passes. graph build is intentionally broken by this commit — the next commit prunes handlers that reference deleted fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cut ~3000 LOC of handler code that wrote to deleted entities: - Delete helper.ts, sumTree.ts, pdp-service.ts (latter was never wired in subgraph.yaml in the first place). - Delete abis/PDPService.json (unreferenced). - Rewrite pdp-verifier.ts from 1603 → 233 LOC. Drop EventLog, Transaction, Service, ServiceProviderLink bookkeeping and the weekly/monthly metrics rollups. Delete handleProofFeePaid entirely (fee not queried). Keep handlePossessionProven as a one-line flag flipper — needed because handleNextProvingPeriod uses provenThisPeriod to classify skipped periods as faults. - Trim fwss.ts to populate only the six FWSS fields still on the schema (fwssPayer, fwssServiceProvider, withIPFSIndexing, pdpPaymentEndEpoch, plus ipfsRootCID on Root). - Add provenThisPeriod back to DataSet schema — internal signal for the faultedPeriods computation; not directly queried. - Drop ProofFeePaid binding and unused entity names from manifest. - Shrink utils/index.ts to just MaxProvingPeriod. graph codegen and graph build pass. graph test fails on cases that still reference deleted fields — task-04 rewrites tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Delete fault-calculation.test.ts (962 LOC of SumTree + per-piece fault math that's gone). - Gut pdp-verifier.test.ts down to a single DataSet+Provider+Root sanity test. - Rewrite dataset-status.test.ts as a focused EMPTY→READY→PROVING→ EMPTY→READY→PROVING lifecycle run plus per-transition checks. - Rewrite fwss.test.ts to drop assertions on deleted fields (fwssProviderId, metadataKeys/Values, withCDN, fwssPdpRailId) and add an end-to-end test that exercises the exact shape backing GET_FWSS_CANDIDATE_PIECES. - Add .github/workflows/subgraph.yml: build both networks, restore mainnet manifest, run matchstick. Runs only on apps/subgraph/** changes. - Update .env.example and docs/environment-variables.md to point PDP_SUBGRAPH_ENDPOINT at the dealbot-owned Goldsky slots. All 22 matchstick tests pass. Goldsky deploy remains a follow-up requiring credentials (see apps/subgraph/README.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
87614af to
86c300a
Compare
86c300a to
ee65c7e
Compare
0fc999c to
06320f7
Compare
Contributor
Author
|
I've just ran it agains calibration and these are some early results that end up in the DB: |
The two anon-piece sampling queries differed only by the presence of withIPFSIndexing: true in the nested proofSet filter. Replace them with a single buildSampleAnonPieceQuery(pool) function that emits the shared shape and drops the filter when pool === "any".
Collaborator
|
@dennis-tra can we please break out the subgraph into a separate PR? 10k is.. unreviewable =/ |
Contributor
Author
|
replaced by #487 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi folks,
Note
This PR has grown quite large in terms of change lines (>10k) but many of them are the JSON ABIs for the contracts (~3.7k LOC) and test files. Nevertheless, there is still a lot of new logic.
This pull request adds the anonymous retrieval functionality discussed in #427 . In that issue we agreed that it's sufficient to implement a random piece selection and perform a retrieval for it (as opposed to always query the data that dealbot uploaded).
This is implemented as follows:
Regarding the anonymous piece selection logic:
The
retrievalAnoncheck probes an SP for non-dealbot pieces so we can detect SPs that behave well even if the teacher is not watching. To do this fairly, the piece selection should satisfy the following requirements:withIPFSIndexingpieces (so CAR/IPNI validation has something to check) but still exercise non-indexed pieces so an SP can't optimise only its CAR corpus.How it works in practice:
Every Root entity in the subgraph carries a
sampleKey = keccak256(setId-rootId)populated once at insert time. Becausekeccak256is uniform over 256 bits and independent of creation order/size/dataset,sampleKeysorts roots into a uniform random permutation that is stable across queries.This is necessary because you cannot just select a random element from a range query in GraphQL. If we knew the total number of pieces we could define a random
skipvalue but this is also capped at 5000. I've read that it becomes very inefficient at higher values. This would also require a non-trivial bookkeeping of active pieces/datasets counts. ThesampleKeyis much easier.Drawing a sample looks like this:
withIPFSIndexing: truewith probability 80%; otherwise no filter.$sampleKeyand query:$sampleKeywhich is effectively a uniform random pick, in O(log N).pdpPaymentEndEpochhas already passed the latest indexed block, or if its CID appears in the last 500 anonymous retrievals (so we don't sample the same block twice in fast succession). On a miss, redraws once with a fresh$sampleKey.Subgraph
I have deployed the new subgraphs:
mainnet: https://api.goldsky.com/api/public/project_cmo9sxe5xd4ai01x8cpageyid/subgraphs/dealbot-mainnet/0.3.0/gncalibration: https://api.goldsky.com/api/public/project_cmo9sxe5xd4ai01x8cpageyid/subgraphs/dealbot-calibration/0.3.0/gnA deployment looks like this from within the
subgraphfolder (prerequisite is a call togoldsky login):Things to be aware of
Transparency: Claude helped; especially with the earlier commits.