chore: release to production (main)#458
Merged
Merged
Conversation
ebb719c to
e8ec83b
Compare
a254dab to
92a96d1
Compare
502bf5f to
ef3ae42
Compare
ef3ae42 to
e481600
Compare
SgtPooki
approved these changes
May 5, 2026
Collaborator
Author
|
🤖 Created releases: 🌻 |
dennis-tra
added a commit
to probe-lab/dealbot
that referenced
this pull request
May 15, 2026
commit 008f0d8 Author: Puspendra Mahariya <95584952+silent-cipher@users.noreply.github.com> Date: Thu May 14 20:52:01 2026 +0530 feat: filter out sps with dev tags (FilOzone#526) * feat: update Synapse stack for filecoin-pin 0.21 feat: update Synapse stack for filecoin-pin 0.21 * feat: filter out dev providers from active pool * feat: look for service_status * docs: document serviceStatus=dev opt-out mechanism for SPs * chore: remove excessive test cases * refactor: stick to dealbot defined serviceStatus format --------- Co-authored-by: Phi <orjan.roren@gmail.com> commit 7aa2f8a Author: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com> Date: Thu May 14 11:15:33 2026 -0400 fix: handle PDP-terminated datasets via data_set_creation repair (FilOzone#518) * fix(jobs): repair PDP-terminated datasets in data_set_creation (FilOzone#379) PDP can mark a dataset terminated while FWSS still has pdpEndEpoch=0. synapse-sdk createContext filters only on pdpEndEpoch, so it returns dead datasets. The next add-pieces fails with "Data set has been terminated due to unrecoverable proving failure". data_set_creation now classifies each slot as missing | live | terminated and runs a bounded repair on terminated: terminateDataSet, poll FWSS pdpEndEpoch != 0, mark affected Deal rows cleaned_up in a single transaction, defer the replacement to the next tick. Deal job now skips when the resolved context's dataSetId is PDP-dead, before any data-storage metric, upload, or Deal-row write. Includes a one-shot backfill script for existing terminated datasets. Upstream trackers (orthogonal): FilOzone/synapse-sdk#780, FilOzone/filecoin-services#473. * style: biome formatter fixes * fix(deal): only treat known terminal probe error as terminated; idempotent repair isDataSetLive previously returned false for ANY validateDataSet failure, so a transient RPC error could classify a healthy dataset as PDP-terminated and trigger destructive repair. It now returns false only for the known terminal "does not exist or is not live" message and rethrows everything else. repairTerminatedDataSet is now idempotent on partial-prior-run state: - If FWSS pdpEndEpoch is already non-zero, skip terminateDataSet entirely. - If terminateDataSet reverts with an already-terminated message, treat it as a no-op and continue to the FWSS state poll + cleanup. - After terminateDataSet, await the tx receipt before polling FWSS state. Adds tests for the rethrow path, the already-terminated skip, and the revert-as-noop path. * fix: address remaining copilot comments - waitForPdpEndEpoch: switch abortable sleep to node:timers/promises setTimeout({signal}). Removes the manual addEventListener/clearTimeout pair, which leaked listeners on resolve and had a race when the signal aborted between throwIfAborted() and addEventListener(). - dev-tools background deal: emit a separate background_deal_skipped event/message when createDealForProvider returns null. The previous message claimed success on the skip path. * chore: drop one-shot backfill script; rely on data_set_creation ticks * fix: address silent-cipher review (orphan PENDING + upfront validation) - handleDealJob now probes baseline dataset via getDataSetProvisioningStatus unconditionally before deal preparation; terminated baseline or selected dsIndex fails the job (handler_result=error in Prometheus) instead of wasting upload prep. - triggerDeal marks the placeholder Deal row FAILED (with errorMessage) on the PDP-terminated skip path; preserves the row for HTTP polling and audit. - Remove dead checkDataSetExists (and its tests); getDataSetProvisioningStatus is strictly more informative (missing|live|terminated). * fix: replace null-on-skip with typed DealJobTerminatedDataSetError createDealForProvider/createDeal now return Promise<Deal> and throw a typed error when the targeted data set is PDP-terminated. Callers map the typed error to FAILED outcomes without relying on a null return: - jobs.service handleDealJob: upfront baseline and dsIndex probes throw the typed error; outer catch records handler_result=error and logs deal_job_failed_terminated_dataset. dsIndex probe also logs dataSetIndex locally before re-throw so per-slot context is preserved. - dev-tools triggerDeal: existing background catch updates the Deal row to FAILED with the thrown error message. - createDeal: a preUploadTerminated flag short-circuits the catch's failure metrics and the finally's saveDeal so the terminated path does not spam metrics or rows. - waitForPdpEndEpoch: wrap getDataSet in awaitWithAbort so in-flight polls honor the abort signal (Copilot 3229623471). * chore: trim redundant abort check and narrating comments - waitForPdpEndEpoch: drop signal?.throwIfAborted() at loop head; awaitWithAbort already performs it. - Trim narrating fragments from PDP-terminated guard comments; keep only the non-obvious FWSS-vs-PDP rationale and the issue link. * chore: biome format dev-tools event-name ternary * refactor: centralize data-set probe in DealService (FilOzone#535) * refactor: centralize data-set probe in DealService Lift dsIndex selection + provisioning probe out of handleDealJob into DealService.resolveDataSetMetadataForDeal, invoked from createDealForProvider. handleDealJob just delegates and maps DealJobTerminatedDataSetError to handler_result="error". Behavior change: when minNumDataSetsForChecks > 1 and the randomly selected indexed slot is PDP-terminated, the deal job falls back to the baseline slot instead of failing (logs deal_job_dataset_index_terminated first). data_set_creation still owns repair. The post-createContext isDataSetLive guard inside createDeal stays as the commit-time TOCTOU check on the exact dataSetId the upload will use. * style: biome format + consolidate indexed-slot fallback tests via it.each * docs: drop TOCTOU phrasing in resolveDataSetMetadataForDeal jsdoc commit d2f21ce Author: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com> Date: Wed May 13 07:55:03 2026 -0400 feat(web): link to combined approved-SP dashboard on landing (FilOzone#525) * feat(web): link to combined approved-SP dashboard on landing Adds a configurable link on the landing page pointing to the BetterStack dashboard that shows combined performance metrics for approved SPs. Lets visitors see overall FOC storage experience without first picking an SP. Configured via APPROVED_SP_DASHBOARD_URL (runtime) / VITE_APPROVED_SP_DASHBOARD_URL (build). Closes FilOzone#384 * feat(web): network-aware approved-SP dashboard CTA - Split APPROVED_SP_DASHBOARD_URL into per-network vars (_MAINNET / _CALIBRATION) so a single web deployment can serve both networks correctly. - Render the link as a primary CTA card above the per-SP table, with copy qualified by the current network ("...on Calibration"). - a11y: mark decorative ExternalLink icons aria-hidden. * fix(web): name both runtime and build-time vars in invalid-URL warning Copilot review feedback: warning previously named only the runtime var, but getConfigUrl falls back to the VITE_* build var too. Surface both so operators can find the right knob during local dev. * chore(web): biome format commit c9ad711 Author: Phi-rjan <orjan.roren@gmail.com> Date: Wed May 13 03:50:42 2026 +0200 chore: update Synapse stack for filecoin-pin 0.21 (FilOzone#521) * feat: update Synapse stack for filecoin-pin 0.21 feat: update Synapse stack for filecoin-pin 0.21 * docs(checks): rename Synapse progress events for filecoin-pin 0.21 --------- Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com> commit 0bb5217 Author: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com> Date: Tue May 12 13:42:30 2026 -0400 fix: stop retention counter double-counts (FilOzone#519) * fix: stop retention counter double-counts * refactor(data-retention): drop redundant poll guard and instance map * docs(data-retention): align wording with poll-local baselines commit fd6ce8a Author: FilOz Bot <infra+github-fil-ozzy@filoz.org> Date: Thu May 7 00:40:00 2026 -0700 chore: release to production (main) (FilOzone#514) commit 9ef0235 Author: Puspendra Mahariya <95584952+silent-cipher@users.noreply.github.com> Date: Thu May 7 12:57:21 2026 +0530 fix: revert back to old synapse version (FilOzone#512) commit 8658e34 Author: FilOz Bot <infra+github-fil-ozzy@filoz.org> Date: Tue May 5 17:44:14 2026 +0200 chore: release to production (main) (FilOzone#458) commit c410184 Author: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com> Date: Tue May 5 09:50:50 2026 -0400 docs(checks): close resolved TBDs in data-storage, events, README (FilOzone#481) * docs(checks): close resolved TBDs in data-storage, events, README Items previously marked TBD that are now implemented in code: - data-storage.md assertions #3, #5, #6, #7 (pieceConfirmed, IPNI discoverability, retrievability, all-checks-gated) -> Yes. - data-storage.md poll intervals: replace TBD_VARIABLE refs with the concrete sources (hardcoded POLLING_INTERVAL_MS = 2.5s for SP piece status, IPNI_VERIFICATION_POLLING_MS env var with 2s default for IPNI verification; doc previously claimed 5s). - data-storage.md section 7 header drops TBD; intro disclaimer removed. - data-storage.md "TBD Summary" rewritten as "Implementation History" with code references for inline retrieval, CID integrity, per-deal timeout (AbortController -> DealStatus.FAILED), gated status, status model, onPieceConfirmed, IPFS gateway retrieval, filecoin-pin CAR. - events-and-metrics.md: pieceConfirmed -> Yes (pieceConfirmedOnChainMs histogram); ipfsRetrievalIntegrityChecked -> implemented inline via per-block sha256 verification in ipfs-block.strategy.ts (no discrete event); ipfsRetrievalFirstByte/LastByteReceived marked Partial since duration histograms exist but no discrete event; histogram-buckets TBD replaced with link to metrics-prometheus.module.ts. - README.md: name the dataset-creation job (data-set-creation) and reference its config envs. Still TBD (not changed in this commit): uploadToSpStart, ipniVerificationStart, ipfsRetrievalStart events; jobs.md PR FilOzone#263 lookahead-skip; PDP_SUBGRAPH_ENDPOINT production value. * docs(checks): address review feedback on callback names and event states - data-storage.md: rename Synapse callbacks to plural form (onPiecesAdded, onPiecesConfirmed) to match deal.service.ts. - events-and-metrics.md: same rename in the event list. Clarify that dealCreated maps to DealStatus.DEAL_CREATED only after all gates pass (upload alone sets UPLOADED, not DEAL_CREATED). - events-and-metrics.md: ipfsRetrievalIntegrityChecked downgraded from Yes to Partial since no discrete event is emitted (inline check only). - events-and-metrics.md: Mermaid timeline now matches the table - ipfsRetrievalFirstByteReceived/LastByteReceived labelled as "Partial: histogram only", ipfsRetrievalIntegrityChecked labelled "Partial: inline check, no event". - README.md: refer to the canonical pg-boss job type data_set_creation (underscore) so operators can map the doc to jobType values. * docs(checks): fix unreadable Mermaid rect fill in event timeline The 'Data Storage Only' rect used rgb(50, 50, 50), which renders as a near-black block that hides the message labels and arrows inside it (both on GitHub light/dark themes). Switch to a translucent rgba(120, 120, 200, 0.15) so the highlight is visible without obscuring content. * docs(events): reframe Event List as timing markers, not emitted events The 'events' in this doc are named anchors used to define metric Timer Starts/Ends; dealbot does not necessarily emit each as a discrete Prometheus event or log line. Add an explicit note up top so readers don't expect every entry to map to an emitted event, and update rows that were marked TBD/Partial purely because no discrete event is emitted. - uploadToSpStart -> Yes (anchor: deal.uploadStartTime in deal.service.ts:255). - ipniVerificationStart -> Yes (anchor: ipniVerificationStartTime in ipni-verification.service.ts:63 - drives ipniVerifyMs). - ipfsRetrievalStart -> Yes (anchor: retrieval startTime in retrieval-addons.service.ts:227; logs 'retrieval_started'). - ipfsRetrievalFirstByteReceived -> Yes (drives ipfsRetrievalFirstByteMs). - ipfsRetrievalLastByteReceived -> Yes (drives ipfsRetrievalLastByteMs). - ipfsRetrievalIntegrityChecked -> Yes (per-block sha256 in ipfs-block.strategy.ts; inline, no discrete event). - Mermaid timeline: drop the (TBD) / (Partial: ...) annotations on these markers so the diagram and the table agree. * docs(events): drop Implemented column from Event List All rows are now Yes (each marker is anchored in code), so the column adds no signal. Anchor details folded into the Source-of-truth column. Intro note tightened. * Update docs/checks/data-storage.md Co-authored-by: Puspendra Mahariya <95584952+silent-cipher@users.noreply.github.com> * Update docs/checks/README.md Co-authored-by: Puspendra Mahariya <95584952+silent-cipher@users.noreply.github.com> --------- Co-authored-by: Puspendra Mahariya <95584952+silent-cipher@users.noreply.github.com> commit 126b2d8 Author: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com> Date: Tue May 5 08:24:47 2026 -0400 fix(deal): cancel onStored addons when upload fails (FilOzone#505) * fix(deal): cancel onStored addons when upload fails The synapse-sdk StorageContext.upload fires onStored before commit/addPieces, so dealbot's IPNI monitoring runs detached. When executeUpload throws (e.g. 409 on POST /pdp/data-sets/{id}/pieces against a Curio-terminated dataset), the leaked IPNI poll runs to its 120s timeout and logs a misleading ipni_tracking_failed event after the deal already failed. Wire an AbortController for the detached addons, composed with the parent signal via AbortSignal.any, and abort + drain in the catch path. Closes FilOzone#503. * fix(deal): clear ipniStatus on aborted onStored, fix TS narrowing Two follow-ups on the addon-cancel path: 1. Use a wrapper object for onStoredAddons.promise so TS preserves the union type across closure mutation in onProgress; the prior `let x: Promise<boolean> | null = null` pattern narrowed to `null` in finally and broke typecheck. 2. Clear deal.ipniStatus on aborted onStored runs. IpniAddonStrategy.onStored sets PENDING before awaiting; if we abort before terminal status is set, sp_performance_query.helper counts PENDING as `total_ipni_deals` and depresses ipni_success_rate. Set null on FAILED deals so aborted runs don't pollute the metric. * fix(deal): only clear ipniStatus when still PENDING after addon abort Earlier fix cleared ipniStatus for any FAILED deal, which would also wipe legitimate IpniStatus.FAILED set by IpniAddonStrategy on real IPNI failures and IpniStatus.VERIFIED on retrieval-stage failures. Narrow the condition to PENDING so only mid-flight aborts are cleared. * style: apply biome format * fix(deal): skip onStored addon abort when success path already awaited it
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 I have created a release beep boop
backend: 1.5.0
1.5.0 (2026-05-05)
Features
Bug Fixes
Miscellaneous
web: 1.2.0
1.2.0 (2026-05-05)
Features
Bug Fixes
Miscellaneous
This PR was generated with Release Please. See documentation.