feat(storage): multi-copy upload with store->pull->commit flow#593
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
synapse-dev | 31a254e | Commit Preview URL Branch Preview URL |
Feb 26 2026, 06:14 AM |
|
Docs lint failing, this still needs a big docs addition but that can come a little later as we get through review here. Here's some notes I built up about failure modes and handling: Multi-Copy Upload: Failure HandlingPhilosophy
Partial Success Over AtomicityWhen a user requests N copies and we can only achieve fewer, we commit what we have rather than throwing everything away:
Failure Modes by StageThe multi-copy upload has a sequential pipeline: select → store → pull → commit. Stage 0: Provider Selection (before any upload)Provider selection uses a tiered approach with ping validation at each step:
Ping validation: Before selecting any provider, we ping their PDP endpoint. If ping fails, we try the next provider in the current tier before falling to the next tier.
Key distinction:
Stage 1: Store (upload data to primary SP)Store has two sub-stages:
Store failure is unambiguous from the SDK's perspective: either we have confirmed parked data, or we don't. The user can safely retry. Note: If 1b times out, data might exist on the SP but we can't confirm it. The SP will eventually GC parked pieces that aren't committed. Stage 2: Pull (SP-to-SP fetch to secondaries)
Pull failure is recoverable: data is still on the primary, no on-chain state exists yet. Retrying pull is cheap (SP-to-SP, no client bandwidth). Stage 3: Commit (addPieces on-chain transaction)
Behaviour Matrix
Error Types/** Primary store failed - no data stored anywhere, safe to retry */
class StoreError extends Error {
name = 'StoreError'
}
/** All commits failed - data stored on SP(s) but nothing on-chain, safe to retry */
class CommitError extends Error {
name = 'CommitError'
}
// Partial commit failures appear in result.failures[] with role: 'primary' or 'secondary'
// Only throws CommitError when ALL providers fail to commitWhat Users Must CheckUsers should always inspect // If ALL commits fail, upload() throws CommitError
// If at least one succeeds, we get a result:
const result = await synapse.storage.upload(data, { count: 3 })
// Check if endorsed provider (primary) failed
const primaryFailed = result.failures.find(f => f.role === 'primary')
if (primaryFailed) {
console.warn(`Endorsed provider ${primaryFailed.providerId} failed: ${primaryFailed.error}`)
// Data is only on non-endorsed secondaries
}
// Check if we got all requested copies
if (result.copies.length < 3) {
console.warn(`Only ${result.copies.length}/3 copies succeeded`)
for (const failure of result.failures) {
console.warn(` Provider ${failure.providerId} (${failure.role}): ${failure.error}`)
}
}
// Every copy in copies[] is committed on-chain
for (const copy of result.copies) {
console.log(`Provider ${copy.providerId}, dataset ${copy.dataSetId}, piece ${copy.pieceId}`)
}Auto-Retry LogicWhen user calls
When user specifies Design Decision: Primary Commit Failure HandlingCurrent implementation commits on all providers in parallel via Endorsed providers are selected as primary because they're curated for reliability. If primary (endorsed) fails but secondary (non-endorsed) succeeds, the user ends up with data only on non-endorsed providers. This may not meet product requirements of having one copy on an endorsed provider. // Check if endorsed provider failed
const primaryFailed = result.failures.some(f => f.role === 'primary')
if (primaryFailed) {
// Handle: retry, alert, or treat as error depending on requirements
} |
|
I noticed this:
What is the test for the availability of an Endorsed Provider in the case we have more than one? If the first store fails, is there a retry? Under retry:
If we have 2 Endorsed, and the store on primary operation fails do we retry the other endorsed? |
|
@timfong888 I've clarified the post above with more detail:
|
|
Docs updated to pass lint, additional tests added to address some gaps. |
|
I am not clear on this:
My understanding is if no Endorsed SP succeeds, it's a failure operation, because if there is no Endorsed and we only have Approved, that has a low durability guarantee. |
For primary selection (first context), exhaustion = error (can't proceed) The above seems right. If Primary exhausts, it's error, not go to the next tier, right? |
|
Question: If the endorsed provider passed ping during selection but then fails during store() (HTTP upload or parking |
What happens if GC before retry? |
eb878ac to
29ac8ad
Compare
|
On the tier question: yes, the current code does fall back to approved-only if no endorsed provider passes the health check. A
Not right now. Couple of reasons:
Curio GCs unreferenced pieces after 24 hours, so there's a comfortable window for retries for the commit phase. |
|
Okay. So it randomizes across the Endorsed SP for ping if no existing context. As long as they are good and an endorsed stores and commits successfully we are good. That's a fair assumption. |
…lity Borrowed a lot of this from #593, and merged with foc-devnet-info support.
59e576b to
63c6170
Compare
2d43c4f to
70fa757
Compare
|
Two design changes landed based on product discussion with @timfong888:
|
|
Updated on top of #544. Minor updates to the original post here (which is the commit message) to reflect latest form with newest product requirements implemented. |
619499d to
f63e566
Compare
4770bab to
a3a248f
Compare
|
@hugomrdias (and @rjan90 ) I'm bailing on my 3rd PR and just putting it in here as a second commit. I discovered when doing this that I'd lost something during my rebase to post-0.37 master (when you give providerIds and dataSetIds it should only use them and not do the cascade thing). I put that back in the latest commit and it's now more complete (:crossed_fingers:). But, as you might see if you looked at that commit, it's the one that pulls a bunch more stuff back into synapse-core, the previous commit didn't touch core, that was all left for #544, and this new one adds a big docs modification. The docs have 3 levels:
example-storage-e2e.js works, confirmed working for single and multiple files, small and large, in devnet and on calibnet 🥳. |
test: mocked JSON RPC Update packages/synapse-core/test/foc-devnet-info.test.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Update packages/synapse-core/src/foc-devnet-info/src/index.ts Co-authored-by: Rod Vagg <rod@vagg.org> fix: make example script work again, refactor for maximum example utility (#604) Borrowed a lot of this from #593, and merged with foc-devnet-info support. Update packages/synapse-core/src/foc-devnet-info/src/index.ts Co-authored-by: Rod Vagg <rod@vagg.org> fixes: PR review fix: remove redundant loadDevnetInfo() function
Multi-Copy Durability in Synapse (What's New)Store data across multiple storage providers with a single upload. The SDK handles replication server-side: data is uploaded once and providers copy it between themselves. What's NewMulti-Copy Uploads
const result = await synapse.storage.upload(data)
// result.copies: each successful copy with provider, dataset, and retrieval URL
// result.failures: any providers that failedIf the primary copy fails (store or commit), Target Specific ProvidersControl where your copies go: // Specific providers
await synapse.storage.upload(data, { providerIds: [1n, 2n, 3n] })
// Specific existing datasets
await synapse.storage.upload(data, { dataSetIds: [10n, 20n] })
// Or let the SDK choose (default: 2 copies, endorsed primary)
await synapse.storage.upload(data, { count: 3 })Split Operations for Batching & Greater ControlBreak the upload pipeline into independent phases - const [primary, secondary] = await synapse.storage.createContexts({
count: 2,
metadata: { source: "my-service" },
})
// Store multiple pieces on the primary
const stored = await Promise.all(files.map(file => primary.store(file)))
const pieceCids = stored.map(s => s.pieceCid)
// Pre-sign once for all pieces (avoids multiple wallet prompts)
const extraData = await secondary.presignForCommit(
pieceCids.map(cid => ({ pieceCid: cid }))
)
// Secondary pulls all pieces from primary (server-to-server, no client bandwidth)
await secondary.pull({ pieces: pieceCids, from: primary, extraData })
// Commit all pieces on-chain in one transaction per provider
await primary.commit({ pieces: pieceCids.map(cid => ({ pieceCid: cid })) })
await secondary.commit({ pieces: pieceCids.map(cid => ({ pieceCid: cid })), extraData })Each phase is independently retryable. If the on-chain commit fails, the data is already stored on the provider, retry Upload Progress VisibilityTrack what's happening across providers: await synapse.storage.upload(data, {
onStored: (providerId, pieceCid) => { /* data uploaded to provider */ },
onPullProgress: (providerId, pieceCid, status) => { /* SP-to-SP transfer progress */ },
onCopyComplete: (providerId, pieceCid) => { /* secondary copy confirmed */ },
onCopyFailed: (providerId, pieceCid, error) => { /* secondary copy failed */ },
onPiecesAdded: (txHash, providerId, pieces) => { /* on-chain tx submitted */ },
onPiecesConfirmed: (dataSetId, providerId, pieces) => { /* on-chain tx confirmed */ },
})Structured ErrorsErrors now tell you exactly what failed and where:
Both carry the Provider Selection for Core UsersFor applications that need direct control without the SDK wrapper, provider selection is now available as stateless functions in import { fetchProviderSelectionInput, selectProviders } from "@filoz/synapse-core/warm-storage"
// Single multicall gathers providers, endorsements, and existing datasets
const input = await fetchProviderSelectionInput(client, {
address: walletAddress,
metadata: { source: "my-service" },
})
// Pure function, no network calls, deterministic
const [primary] = selectProviders(
{ ...input, endorsedIds: input.endorsedIds }, // endorsed only
{ count: 1 }
)
const [secondary] = selectProviders(
{ ...input, endorsedIds: new Set() }, // any approved provider
{ count: 1, excludeProviderIds: new Set([primary.provider.id]) }
)SP-to-SP Pull for Core UsersInitiate and monitor server-side replication directly: import { pullPieces, waitForPullStatus } from "@filoz/synapse-core/sp"
const result = await waitForPullStatus(client, {
serviceURL: secondaryProvider.pdp.serviceURL,
pieces: [{
pieceCid,
sourceUrl: `${primaryProvider.pdp.serviceURL}/pdp/piece/${pieceCid}`,
}],
payee: secondaryProvider.serviceProvider,
payer: client.account.address,
cdn: false,
metadata: { source: "my-service" },
onStatus: (response) => console.log(response.status),
})The pull endpoint is idempotent, the same signed request can be safely retried and doubles as a status check. Breaking Changes
|
| /** | ||
| * Source for pulling pieces from another provider | ||
| */ | ||
| export type PullSource = string | { getPieceUrl: (pieceCid: PieceCID) => string } |
There was a problem hiding this comment.
what is this getPieceUrl? whats the use case ?
why not string | (pieceCid: PieceCID) => string
There was a problem hiding this comment.
Yeah, this is bad, fixed.. now just string or function. But also worth considering alternatives:
- Require
pieces: [{ cid, from },...]- tuples, nice and explicit (but verbose) - Require
pieces: [from...]- location only, expect that we can extract cid from the path (lose type checking) - Require just
baseUrland append CID to the end of it (most simple, but least flexible) - Require just a function that converts a CID to a URL (most flexible)
|
@hugomrdias: I'm pulling up the couple of "can we kill I don't want to do this now because I don't want to boil the ocean, I want to ship something, and I wouldn't mind shipping something that isn't an entirely new thing that needs new docs, new mental models, so this is aiming at iteration to get this out, not radical transformation and signing up for more work than the already hard work of describing multi-copy. Currently, conceptually, Also, honestly, I'm fine with |
|
No dependency changes detected. Learn more about Socket for GitHub. 👍 No dependency changes detected in pull request |
70fa757 to
d416aec
Compare
Implement store->pull->commit flow for efficient multi-copy storage replication. Split operations API on StorageContext: - store(): upload data to SP, wait for parking confirmation - presignForCommit(): pre-sign EIP-712 extraData for pull + commit reuse - pull(): request SP-to-SP transfer from another provider - commit(): add pieces on-chain with optional pre-signed extraData - getPieceUrl(): get retrieval URL for SP-to-SP pulls StorageManager.upload() orchestration: - Default 2 copies (endorsed primary + any approved secondary) - Single-provider: store->commit flow - Multi-copy: store on primary, presign, pull to secondaries, commit all - Auto-retry failed secondaries with provider exclusion (up to 5 attempts) Provider selection: - Primary requires endorsed provider (throws if none reachable) - Secondaries use any approved provider from the pool - 2-tier selection per role: existing dataset, then new dataset Callback refinements: - Remove redundant onUploadComplete (use onStored instead) - onStored(providerId, pieceCid) - after data parked on provider - onPieceAdded(providerId, pieceCid) - after on-chain submission - onPieceConfirmed(providerId, pieceCid, pieceId) - after confirmation Type clarity: - Rename UploadOptions.metadata -> pieceMetadata (piece-level) - Rename CommitOptions.pieces[].metadata -> pieceMetadata - StoreError/CommitError carry providerId and endpoint for optional telemetry - New: CopyResult, FailedCopy for multi-copy transparency Implements #494
…docs for multi-copy Move provider selection logic (selectProviders, fetchProviderSelectionInput, findMatchingDataSets) from SDK internals to synapse-core as public API for DIY users. Simplify selection from 4-tier fallback to 2-tier preference (existing dataset -> new dataset) since endorsedIds already controls the eligible pool. Clean up createContexts() to three explicit paths (dataSetIds, providerIds, smartSelect) with count validation and duplicate-provider guard. Update storage docs to reflect multi-copy as the default upload path.
65905b6 to
31a254e
Compare


Sits on top of #544 which has the synapse-core side of this.
Implement store->pull->commit flow for efficient multi-copy storage replication.
Split operations API on StorageContext:
StorageManager.upload() orchestration:
Provider selection:
Callback refinements:
Type clarity:
Implements #494