chore: merge new changes from ipfs/kubo master#2
Open
alvin-reyes wants to merge 882 commits into
Open
Conversation
* test: add migration tests for Windows and macOS - add dedicated CI workflow for migration tests on Windows/macOS - workflow triggers on migration-related file changes only * build: remove redundant go version checks - remove GO_MIN_VERSION and check_go_version scripts - go.mod already enforces minimum version (go 1.25) - fixes make build on Windows * fix: windows migration panic by reading config into memory fixes migration panic on Windows when upgrading from v0.37 to v0.38 by reading the entire config file into memory before performing atomic operations. this avoids file locking issues on Windows where open files cannot be renamed. also fixes: - TestRepoDir to set USERPROFILE on Windows (not just HOME) - CLI migration tests to sanitize directory names (remove colons) minimal fix that solves the "panic: error can't be dealt with transactionally: Access is denied" error without adding unnecessary platform-specific complexity. * fix: set PATH for CLI migration tests in CI the CLI tests need the built ipfs binary to be in PATH * fix: use ipfs shutdown for graceful daemon termination in tests replaces platform-specific signal handling with ipfs shutdown command which works consistently across all platforms including Windows * fix: isolate PATH modifications in parallel migration tests tests running in parallel with t.Parallel() were interfering with each other through global PATH modifications via os.Setenv(). this caused tests to download real migration binaries instead of using mocks, leading to Windows failures due to path separator issues in external tools. now each test builds its own custom PATH and passes it explicitly to commands, preventing interference between parallel tests. * chore: improve error messages in WithBackup * fix: Windows CI migration test failures - add .exe extension to mock migration binaries on Windows - handle repo lock file properly in mock migration binary - ensure lock is created and removed to prevent conflicts * refactor: align atomicfile error handling with fs-repo-migrations - check close error in Abort() before attempting removal - leave temp file on rename failure for debugging (like fs-repo-15-to-16) - improves consistency with external migration implementations * fix: use req.Context in repo migrate to avoid double-lock The repo migrate command was calling cctx.Context() which has a hidden side effect: it lazily constructs the IPFS node by calling GetNode(), which opens the repository and acquires repo.lock. When migrations then tried to acquire the same lock, it failed with "lock is already held by us" because go4.org/lock tracks locks per-process in a global map. The fix uses req.Context instead, which is a plain context.Context with no side effects. This provides what migrations need (cancellation handling) without triggering node construction or repo opening. Context types explained: - req.Context: Standard Go context for request lifetime, cancellation, and timeouts. No side effects. - cctx.Context(): Kubo-specific method that lazily constructs the full IPFS node (opens repo, acquires lock, initializes subsystems). Returns the node's internal context. Why req.Context is correct here: - Migrations work on raw filesystem (only need ConfigRoot path) - Command has SetDoesNotUseRepo(true) - doesn't need running node - Migrations handle their own locking via lockfile.Lock() - Need cancellation support but not node lifecycle The bug only appeared with embedded migrations (v16+) because they run in-process. External migrations (pre-v16) were separate processes, so each had isolated state. Sequential migrations (forward then backward) in the same process exposed this latent double-lock issue. Also adds repo.lock acquisition to RunEmbeddedMigrations to prevent concurrent migration access, and removes the now-unnecessary daemon lock check from the migrate command handler. * fix: use req.Context for migrations and autoconf in daemon startup daemon.go was incorrectly using cctx.Context() in two critical places: 1. Line 337: migrations call - cctx.Context() triggers GetNode() which opens the repo and acquires repo.lock BEFORE migrations run, causing "lock is already held by us" errors when migrations try to lock 2. Line 390: autoconf client.Start() - uses context for HTTP timeouts and background updater lifecycle, doesn't need node construction Both now use req.Context (plain Go context) which provides: - request lifetime and cancellation - no side effects (doesn't construct node or open repo) - correct lifecycle for HTTP requests and background goroutines (cherry picked from commit f4834e7)
Release v0.38.1
keep -dev version from master
chore: merge release v0.38.1
- clarify staging environment step for FINAL releases - mark infrastructure updates (collab cluster, bootstrappers) as FINAL only - improve ipfs-desktop release step wording - update discourse topic examples to v0.38.0 - reference v0.38.0 release issue in metadata comment
Increase default Provide.DHT.MaxProvideConnsPerWorker value to match the DHT replication factor (16 -> 20). A similar value is used in legacy systems (with and without accelerated DHT client).
Upgrade to latest go-dsqueue and go-ds-pebble
* feat: provide stats * added N/A * format * workers stats alignment * ipfs provide stat --all --compact * consolidating compact stat * update column alignment * flags combinations errors * command description * change schedule AvgPrefixLen to float * changelog * alignments * provide stat description draft * rephrased provide-stats.md * linking provide-stats.md from command description * documentation test * fix: refactor provide stat command type handling - add extractSweepingProvider() helper to reduce nested type switching - extract lowWorkerThreshold constant for worker availability check - fix --lan error handling to work with buffered providers * docs: add clarifying comments * fix(commands): improve provide stat compact mode - prevent panic when both columns are empty - fix column alignment with UTF-8 characters - only track col0MaxWidth for first column (as intended) * test: add tests for ipfs provide stat command - test basic functionality, flags, JSON output - test legacy provider behavior - test integration with content scheduling - test disabled provider configurations - add parseSweepStats helper with t.Helper() * docs: improve provide command help text - update tagline to "Control and monitor content providing" - simplify help descriptions - make error messages more consistent - update tests to match new error messages * metrics rename ``` Next reprovide at: Next prefix: ``` updated to: ``` Next region prefix: Next region reprovide: ``` * docs: improve Provide system documentation clarity Enhance documentation for the Provide system to better explain how provider records work and the differences between sweep and legacy modes. Changes to docs/config.md: - Provide section: add clear explanation of provider records and their role - Provide.DHT: add provider record lifecycle and two provider systems overview - Provide.DHT.Interval: explain relationship to expiration, contrast sweep vs legacy behavior - Provide.DHT.SweepEnabled: rewrite to explain legacy problem, sweep solution, and efficiency gains - Monitoring section: prioritize command-line tools (ipfs provide stat) before Prometheus Changes to core/commands/provide.go: - ipfs provide stat help: add explanation of provider records, TTL expiration, and how sweep batching works Changes to docs/changelogs/v0.39.md: - Add context about why stats matter for monitoring provider health - Emphasize real-time monitoring workflow with watch command - Explain what users can observe (rates, queues, worker availability) * depend on latest kad-dht master * docs: nits --------- Co-authored-by: Marcin Rataj <lidel@lidel.org>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
* chore(deps): update go-libp2p to v0.44.0 - includes self-healing UPnP port mappings after router restarts - update go-netroute to v0.3.0 - update quic-go to v0.55.0 - add changelog entry for UPnP fix * docs: improve provide and UPnP clarity in changelog and docs - add alert polling rationale to changelog - add UPnP config note with default clarification - clarify sweep timing and prefix length explanations - add concrete examples for time offset and record holders - improve workers stats formatting - add See Also section to provide-stats.md * docs: add RISC-V prebuilt binaries to changelog and README - highlight linux-riscv64 availability with open hardware context - update README with arm64 builds, remove 32-bit examples
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 5. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v4...v5) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 5 to 6. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@v5...v6) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 5 to 6. - [Release notes](https://github.com/actions/setup-node/releases) - [Commits](actions/setup-node@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-node dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* bump kad-dht: resume reprovide cycle * daemon: --provide-fresh-start flag * changelog * docs * go-fmt * chore: latest go-libp2p-kad-dht#1170 after conflict resolution, to confirm CI is still green * kad-dht: depend on latest master * move daemon flag to Provider.DHT.ResumeEnabled config * refactor: sweep provider datastore * bump kad-dht * bump kad-dht * bump kad-dht * make datastore keys constant * use kad-dht master * add emoji to changelog entry * go-fmt * bump kad-dht * test(provider): add tests for resume cycle feature validates Provide.DHT.ResumeEnabled behavior: - preserves cycle state when enabled (default) - resets cycle when disabled tests verify current_time_offset across restarts using JSON output --------- Co-authored-by: Marcin Rataj <lidel@lidel.org>
addresses stream frame memory pooling issue where StreamFrame objects weren't properly returned to sync.Pool during stream cancellation see quic-go/quic-go#5327
* Upgrade to Boxo v0.35.1 * use tagged boxo release * fix lint error
Release v0.38.2
Merge release v0.38.2
* provider: protect libp2p connections Use latest kad-dht version, introducing connection protection and retention of addresses in peerstore during provide operations. * depend on kad-dht master
* fix: reprovide alert bug * number formatting * show full number for peer count
updates ipfs-webui from v4.9.1 to v4.10.0 https://github.com/ipfs/ipfs-webui/releases/tag/v4.10.0
…11039) This fix restores dynamic log level control and tail for go-libp2p loggers Updated to: https://github.com/libp2p/go-libp2p/releases/tag/v0.45.0 https://github.com/ipfs/go-log/releases/tag/v2.9.0 these changes restore dynamic log level control and tail for go-libp2p subsystems after the migration to slog, fixing the regression introduced in libp2p/go-libp2p#3364 Fixes #11035 For details why and how, see explainer in https://github.com/ipfs/go-log/releases/tag/v2.9.0
* Upgrade to Boxo v0.35.2
Provide/reprovide messages from core/node/provider.go were emitted under core:constructor (the shared core/node constructor subsystem), making GOLOG_LOG_LEVEL and `ipfs log level` hard to target for provide visibility. Scope them to "provider", matching boxo's provider package so a single lever covers both layers. - core/node/provider.go: new providerLog at the "provider" subsystem, applied to 25 keystore/reprovide/strategy/throughput call sites - test/cli/provider_test.go: reprovide dedup subtest raises provider=info instead of core:constructor=info - docs/debug-guide.md: new "Known logger subsystems" section listing provider, dht/provider, dht/provider/lan, dsqueue - docs/environment-variables.md: link to the new section from under GOLOG_LOG_LEVEL
* Upgrade to Boxo v0.39.0
* chore: bump boxo to ipfs/boxo#1140 picks up dspinner fix that snapshots the index before emitting pins, avoiding the streaming lock convoy. * docs: changelog entry for pinner stall fix * docs: clarify pinner snapshot behavior * chore: bump boxo to include ipfs/boxo#1146 Picks up the fix for "panic: pebble: closed" on shutdown (#11292): the dspinner streamIndex goroutine now recovers from any datastore panic and reports it as an error on the output channel, so the daemon exits cleanly instead of crashing when the datastore closes before pin enumeration drains. * fix(provider): quiet keystore-close on shutdown When the daemon shuts down, the keystore Close fires while the startup sync goroutine may still be in flight: the OnStart ctx is not yet cancelled, so ResetCids returning keystore.ErrClosed gets logged at Error as "sync failed". Treat keystore.ErrClosed the same as a cancelled ctx and log at Debug as "interrupted by shutdown". Apply the same rule to the periodic reprovide GC loop (whose error log got a unified message in the process). * test(cli): keystore-close log + pin ls shutdown Adds TestProviderKeystoreSyncShutdownQuiet, a CLI test that: 1. Verifies no shutdown-caused keystore-sync error (err="keystore is closed" or err="context canceled") is logged at Error level. Scans stderr line-by-line so unrelated Error logs (e.g. "reset already in progress" from the startup+periodic overlap at tight Intervals) do not false-positive the assertion. 2. Runs `ipfs pin ls --stream` against the live daemon, shuts the daemon down mid-stream, and asserts the CLI returns within 15s, does not observe a daemon panic, and produces a meaningful error message if it exited non-zero. Uses Provide.DHT.Interval=10ms so the periodic reprovide loop is always inside ResetCids when StopDaemon fires, making the shutdown race deterministic enough to catch the regression on most runs (verified empirically against the pre-fix provider.go).
Provide/reprovide messages from core/node/provider.go were emitted under core:constructor (the shared core/node constructor subsystem), making GOLOG_LOG_LEVEL and `ipfs log level` hard to target for provide visibility. Scope them to "provider", matching boxo's provider package so a single lever covers both layers. - core/node/provider.go: new providerLog at the "provider" subsystem, applied to 25 keystore/reprovide/strategy/throughput call sites - test/cli/provider_test.go: reprovide dedup subtest raises provider=info instead of core:constructor=info - docs/debug-guide.md: new "Known logger subsystems" section listing provider, dht/provider, dht/provider/lan, dsqueue - docs/environment-variables.md: link to the new section from under GOLOG_LOG_LEVEL (cherry picked from commit 6059743)
* Upgrade to Boxo v0.39.0 (cherry picked from commit d62ee27)
* chore: bump boxo to ipfs/boxo#1140 picks up dspinner fix that snapshots the index before emitting pins, avoiding the streaming lock convoy. * docs: changelog entry for pinner stall fix * docs: clarify pinner snapshot behavior * chore: bump boxo to include ipfs/boxo#1146 Picks up the fix for "panic: pebble: closed" on shutdown (#11292): the dspinner streamIndex goroutine now recovers from any datastore panic and reports it as an error on the output channel, so the daemon exits cleanly instead of crashing when the datastore closes before pin enumeration drains. * fix(provider): quiet keystore-close on shutdown When the daemon shuts down, the keystore Close fires while the startup sync goroutine may still be in flight: the OnStart ctx is not yet cancelled, so ResetCids returning keystore.ErrClosed gets logged at Error as "sync failed". Treat keystore.ErrClosed the same as a cancelled ctx and log at Debug as "interrupted by shutdown". Apply the same rule to the periodic reprovide GC loop (whose error log got a unified message in the process). * test(cli): keystore-close log + pin ls shutdown Adds TestProviderKeystoreSyncShutdownQuiet, a CLI test that: 1. Verifies no shutdown-caused keystore-sync error (err="keystore is closed" or err="context canceled") is logged at Error level. Scans stderr line-by-line so unrelated Error logs (e.g. "reset already in progress" from the startup+periodic overlap at tight Intervals) do not false-positive the assertion. 2. Runs `ipfs pin ls --stream` against the live daemon, shuts the daemon down mid-stream, and asserts the CLI returns within 15s, does not observe a daemon panic, and produces a meaningful error message if it exited non-zero. Uses Provide.DHT.Interval=10ms so the periodic reprovide loop is always inside ResetCids when StopDaemon fires, making the shutdown race deterministic enough to catch the regression on most runs (verified empirically against the pre-fix provider.go). (cherry picked from commit 8416f38)
Release v0.41.0
# Conflicts: # docs/changelogs/v0.41.md # version.go
Merge release v0.41.0
0.41.0's httpRouterAddrFunc only resolved 0.0.0.0/:: when AutoNATv2 had a confirmed reachable address. Otherwise it forwarded raw Addresses.Swarm strings to HTTP routers, so isolated or LAN-only nodes published unreachable provider records. - core/node/libp2p/routingopt.go: fallback now calls host.Addrs(), which resolves wildcard binds to concrete interface addrs and applies the libp2p AddrsFactory (NoAnnounce CIDR, Swarm.AddrFilters); matches the DHT provide path (core/node/provider.go selfAddrsFunc) - core/node/libp2p/routingopt_test.go: stubHost.Addrs is configurable; cases rewritten around resolved host addrs, with a new case pinning that NoAnnounce CIDR filtering belongs upstream in host.Addrs - test/cli/delegated_routing_v1_http_client_test.go: new end-to-end case asserts provider records sent over HTTP never contain 0.0.0.0 or :: when Addresses.Swarm uses the default wildcard bind Fixes #11213
These tests verify behavior that is independent of who serves the release JSON: TestUpdate exercises the `ipfs update` command tree, and TestUpdateWhileDaemonRuns checks that read-only subcommands still work while the daemon holds the repo lock. They hit the real GitHub Releases API only by accident, which makes them flake on rate limits, transient 5xx, or release-asset upload races. A flake panics the harness and takes every parallel test in test/cli down with it. Replace the network call with a shared `httptest.Server` helper (`newMockGitHubReleases`) and point the spawned binary at it via `TEST_KUBO_UPDATE_GITHUB_URL`, the same hook `TestUpdateInstall` already uses. The mock returns one stable release with a matching binary asset and follows the convention used by real kubo releases: `kubo_<tag>_<os>-<arch>.<ext>`, where ext is `zip` on Windows and `tar.gz` elsewhere. This must match `assetNameForPlatformTag` in `core/commands/update_github.go`, otherwise `findReleaseAsset` reports "no release found with a binary for <os>/<arch>". No network, no token, no flake. Local runtime drops from ~70s to under 1s.
Bumps [actions/github-script](https://github.com/actions/github-script) from 8 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@v8...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Gillis <11790789+gammazero@users.noreply.github.com>
* docs(config): clarify BlockKeyCacheSize and BloomFilterSize BlockKeyCacheSize was documented as "size in bytes" but the underlying boxo blockstore wires it directly to lru.New2Q[K,V](size int) which is an entry count, not a byte budget. Fix the unit and add memory sizing guidance (~200 B/entry) plus what the cache actually short-circuits (per-block flatfs Stat on the bitswap server hot path). BloomFilterSize section expanded with: what the filter answers (negative Has only), saturation behavior at runtime growth, startup AllKeysChan rebuild cost (one-time, scales with keyset not data volume), and a cross-link to BlockKeyCacheSize as the complementary positive-path cache. Drop the dead go-ipfs-blockstore link. * docs(datastores): explain flatfs next-to-last/3 for large blockstores The default next-to-last/2 shard depth (~1024 dirs) becomes a per-shard file-count problem on nodes growing past a few million blocks: bulk enumeration (GC, BloomFilterSize rebuild on startup, Provide.Strategy=all reprovider) and per-block Stat both pay readdir cost proportional to files-per-shard. next-to-last/3 (~32k dirs) keeps per-directory counts in a range modern filesystems handle well and is the recommended choice for pinning clusters, public gateways, and mirrors. Note that shard depth is fixed at ipfs init time and re-sharding requires a full export/import. * docs(config): expand BloomFilterSize sizing with bbloom specifics Replace the generic worked example with a power-of-two reference table covering 10M to 500M blocks, and document two kubo-specific behaviors that the generic bloom-filter math does not capture: - ipfs/bbloom rounds the bit count up to the next power of two, so non-power-of-two BloomFilterSize values silently allocate more memory than configured (e.g. the historical 1199120-byte example actually allocates a 2 MiB internal filter). - kubo wires bbloom with k=7 hash positions; the FPR formula is fixed at (1 - exp(-7n/m))^7. Memory cost is roughly ~1.2 B/entry at ~1% FPR and scales linearly with target FPR. Add a saturation section showing FPR degradation at 2x / 4x / 8x the design n (~11% / ~58% / >95% respectively), and a Risks subsection clarifying that a poorly sized filter is an operational waste rather than a correctness issue (no false negatives), with a quick bytes-per-block health check. Update the hur.st calculator URL from the n=1e6 default (dev-laptop scale) to n=10e6 (representative of real kubo deployments). Reference sizes verified empirically against ipfs/bbloom v0.1.0: a 16 MiB filter at n=10M gave 0.1875% observed FPR vs 0.18% predicted; the historical ~1.14 MiB worked example at n=1M gave 0.0545% vs 0.054% predicted at the rounded 2 MiB allocation. * docs(config): define FPR up front in BloomFilterSize section The BloomFilterSize section uses "FPR" throughout without defining it. Explain in the intro that the false-positive rate is the probability of a "maybe present" answer for a CID that is not actually in the blockstore, that a false positive costs at most one wasted datastore lookup (no data loss or incorrect retrieval), and that lower FPR means more inbound Has() calls answered from RAM alone. * docs: fold rounding penalty into bloom filter budget byte/entry sizing figures now report the operationally-useful number after bbloom's power-of-two rounding, so an operator following the guidance lands close to the true memory footprint instead of the raw design-point size. - config.md (BloomFilterSize): bump byte/entry to ~1.8/2.8/4.2 at ~1% / 0.1% / 0.01% FPR, state the average ~1.5x rounding penalty (worst case ~2x); drop 500M / 1 GiB row whose m/n=17.18 broke the uniform 10.74 ratio of the rest of the table; active-voice and comma fixes in saturation, risks, and startup prose - config.md (BlockKeyCacheSize): split a comma splice; active voice in 2Q replacement description - datastores.md (flatfs): align shard table columns; soften reshard wording to note kubo ships no in-place tool, not that flatfs forbids it
A zero-value Multiaddr (since go-multiaddr v0.15 is a slice type) encodes to zero bytes on the wire. AddrsFactory was passing such empty entries through to the host's signed peer record, where peers that skip the empty-input check render them as "/" and reject the address. js-libp2p autonatv2 first flagged this against a kubo/0.39.0/2896aed/docker agent. AddrsFactory is the central chokepoint for kubo's announced addresses, so filtering here scrubs every downstream consumer until the upstream go-libp2p fix lands. See libp2p/js-libp2p#3478 (comment)
* docs(server-profile): warn about local reverse proxy gotcha `Swarm.AddrFilters` is consulted on inbound `InterceptAccept` as well as outbound dials, so loopback CIDRs in the filter list cause Kubo to reject every incoming connection from a local nginx or Caddy reverse proxy that fronts a `/ws` (or other libp2p) listener on `127.0.0.1`. The condition is silent: the OS accepts the TCP, then Kubo closes the socket before the libp2p handshake. Add an explicit note to the `Swarm.AddrFilters` section, a new row in the `server` profile override table for the reverse-proxy case, and a matching CAUTION block in the v0.41 changelog. Each pointer says: remove the loopback CIDRs from `Swarm.AddrFilters` only, and keep them in `Addresses.NoAnnounce`. * feat(libp2p): log ERROR for listeners blocked by AddrFilters or NoAnnounce Surface misconfigured listeners at startup and on every libp2p `EvtLocalAddressesUpdated` event, instead of silently dropping incoming connections or staying unadvertised. `findDeadListeners` is a pure function that walks the host's resolved listen addresses (the output of `host.Network().InterfaceListenAddresses()`, matching the post-resolution view used in #11297 for `host.Addrs()`) and matches each IP component against every CIDR rule in `Swarm.AddrFilters` and `Addresses.NoAnnounce`. Working from resolved addresses means wildcard listens like `/ip4/0.0.0.0` and `/ip6/::` are already expanded to concrete interface addresses, so the check does not flag a listener just because the unspecified address itself happens to fall inside a filter CIDR (for example `::` is in `::/3` even though the listener still accepts inbound from globally-routable peers). `MonitorDeadListeners` wires the check into fx: it runs once at startup, subscribes to `event.EvtLocalAddressesUpdated`, and re-runs the check whenever the host's address set changes (NAT mapping comes online, new interface, AutoTLS cert ready). Findings are deduplicated against the previous run so a stable misconfiguration is logged once until it is resolved or a new finding shows up. Loopback `Addresses.NoAnnounce` matches are skipped on the grounds that suppressing loopback advertisement is operator-intent on every `server`-profile node, not a misconfiguration. Loopback in `Swarm.AddrFilters` is the bug pattern that motivated this check; that match is always reported. Each ERROR line names the offending listener, the matching CIDR rule, and the field to remove the rule from to revive the listener: Addresses.Swarm listener "/ip4/127.0.0.1/tcp/8081/ws" matches Swarm.AddrFilters rule "/ip4/127.0.0.0/ipcidr/8", so Kubo rejects every incoming connection to it. Remove "/ip4/127.0.0.0/ipcidr/8" from Swarm.AddrFilters to allow connections to this listener.
* chore(deps): align deps with ipfs/boxo#1152 Bumps boxo to the head of ipfs/boxo#1152 and lands the kubo-only direct deps from this week's dependabot batch in one go so go.mod stays consistent. Direct: - boxo: v0.39.0 to ipfs/boxo#1152 head - libp2p-pubsub: 0.15.0 to 0.16.0 - fsnotify: 1.9.0 to 1.10.0 - go-fuse/v2: 2.9.1-pre to 2.10.1 - otelhttp: 0.67.0 to 0.68.0 - otel, otel/sdk, otel/sdk/metric, otel/trace: 1.42.0 to 1.43.0 - otel/exporters/prometheus: 0.56.0 to 0.65.0 - contrib/propagators/autoprop: 0.46.1 to 0.68.0 Pulled in via boxo: zap 1.28.0, go-unixfsnode 1.10.4. Skipped: cheggaaa/pb v1 to v2 (incompatible API; v2 drops pb.U_BYTES and pb.New64(...).SetUnits, breaking the progress bar usage in core/commands/{cat,add,get,dag/export}.go). Supersedes #11306, #11307, #11308, #11309, #11311, #11312. * fix(metrics): drop otel_scope_info, expose scope as labels The otel prometheus exporter v0.59.0 stopped emitting the standalone otel_scope_info metric. Scope identity is now carried by otel_scope_name, otel_scope_version, and otel_scope_schema_url labels on every metric, added in v0.58.0. The bump to v0.65.0 in this branch crosses that boundary, so the t0119 baseline failed. Update the sharness baseline, docs/metrics.md, and add a v0.42 changelog highlight so operators scraping otel_scope_info know to switch their dashboards to the per-metric labels. * chore(deps): bump boxo to main (incl. ipfs/boxo#1152) boxo@main now includes ipfs/boxo#1152, replacing the temporary PR-pinned revision used in 681a4b9.
denylists only block content retrieval and local IPNS resolution. they do not stop a DHT server from storing or serving provider and IPNS records for denied keys on behalf of other peers, and they do not gate /routing/v1/ responses. document this explicitly and point operators at Routing.Type=autoclient as the way to opt out of acting as a routing intermediary for blocked content. Closes #11317 Closes #11318 Closes #11319 these issues track the implementation work to push denylists into the kad-dht provider store, the IPNS validator and pubsub path, and the /routing/v1/ HTTP layer. until that lands, autoclient is the only operator-facing knob with the same effect, so the docs need to say so.
This upgrades the pebble database to v2.1.5.
* update go-log to v2.9.2
* feat(pinner): close pinner before repo on shutdown The pinner's streaming goroutines hold a reference to the backing datastore, and pebble panics on use after Close. Before this change the panic was recovered inside the pinner (see ipfs/boxo#1146) and the symptom was only a transient log trace on daemon exit, but the race remained. Register a new fx OnStop hook that calls pinner.Close before the repo (and therefore the datastore) closes. Close drains all in-flight stream goroutines, so the datastore is closed only after the pinner is fully quiesced. Bumps boxo to pick up Pinner.Close from ipfs/boxo#1150. Fixes #11292 * chore(deps): bump boxo to ipfs/boxo#1150 (70ffcfa) * chore(deps): bump boxo to ipfs/boxo#1150 (75481f4) ipfs/boxo#1150 was reworked to use context fan-out instead of a done channel. Pinner.Close now cancels every admitted op and waits for them to return, broadening the shutdown contract from "drain streams" to "drain everything". Comments and changelog reworded to match. * chore(deps): bump boxo to latest main (b2b5d8a)
* feat: bound graceful shutdown, add diag healthy
Replace unbounded app.Stop(context.Background()) with a deadline-bounded
context driven by a new Internal.ShutdownTimeout config (default 12h,
0 disables). Add an os.Exit(1) watchdog at the same deadline so an FX
OnStop hook that never returns can no longer hang the daemon.
Add ipfs diag healthy: fails when shutdown has been initiated or when
the DAG pipeline cannot resolve the well-known empty-directory CID.
Dockerfile HEALTHCHECK now uses it so orchestrators recycle half-
shutdown daemons.
- core/shutdown: new pkg; atomic startedAt + CloseWithCtx helper
- core/builder.go: app.Stop bounded by ShutdownTimeout
- cmd/ipfs/kubo/daemon.go: watchdog + MarkStarted on signal
- core/commands/diag.go: new healthy subcommand
- core/node/{bitswap,libp2p/host,libp2p/routing}.go: OnStop hooks wrapped
- config/internal.go: ShutdownTimeout + DefaultShutdownTimeout=12h
- Dockerfile: HEALTHCHECK uses "ipfs diag healthy"
- docs/{config,changelogs/v0.42}.md: documented
- test/cli: enabled + disabled path tests
* feat: bound provider stats and ADD_PROVIDER sends
bumps go-libp2p-kad-dht past v0.39.2 to b73e1e8 to pick up two
related provider bug fixes.
- ipfs provide stat now honors client cancellation and deadlines
instead of blocking indefinitely behind a slow keystore lookup
- adds Provide.DHT.SendProviderRecordTimeout capping each
ADD_PROVIDER RPC so unresponsive peers cannot pin a provide
worker and stall reprovide cycles
- internal reprovide-alert poller bounds its Stats call so a
hung keystore.Size cannot delay shutdown
* test(shutdown): use synctest for timeout test, document sleep
CloseWithCtx_timesOut now runs in a synctest bubble so the deadline
assertion is exact (no wall-clock slack), and the simulated close uses
a release channel to drain the bubble cleanly after the leak point.
The two happy-path tests stay unchanged because their close funcs
return immediately and gain nothing from a fake clock.
Comment the 2ms sleep in TestMarkStartedPreservesFirstTimestamp so
its role (forcing time.Now() to advance between the two MarkStarted
calls so a CAS to Store regression is detectable) is not lost.
Addresses #11329 (review).
* fix(pinner): bound pinner Close with shutdown deadline
The boxo Pinner.Close contract notes that an in-flight op ignoring
its ctx (a downstream bug) can block Close, so the host must bound
it at the call site. Wrapping the OnStop hook with CloseWithCtx
honors Internal.ShutdownTimeout and surfaces an actionable
"subsystem 'pinner' failed to close" log on hang instead of leaving
only the watchdog os.Exit(1) trace.
* fix(shutdown): bound remaining I/O-touching OnStop hooks
Wrap the OnStop hooks whose Close can plausibly block on disk or
network: repo (datastore flush + lock release), mfs-root (datastore
writes via DAGService), peering (waits on libp2p peer goroutines),
legacy-provider (in-flight reprovide RPCs), and the dht-provider
plus keystore pair under SweepingProvider.
In-memory closes (blockservice, peerstore, resource-manager) are
left as-is since they cannot realistically hang.
For the dht-provider/keystore pair, provider closes first so nothing
can access the keystore afterwards. If the shutdown ctx fires
mid-provider-drain, the keystore close sees an expired ctx and
returns immediately; the watchdog os.Exit(1) is the ultimate
backstop, and keystore writes are fsync'd on put so missing the
explicit close is recoverable on next boot.
* fix(shutdown): bound remaining in-memory OnStop hooks
Wrap blockservice, peerstore, and resource-manager Close hooks with
CloseWithCtx for uniformity. These are pure in-memory operations
unlikely to hang in practice, but wrapping costs nothing and makes
the shutdown audit trail uniform: every OnStop hook now honors the
deadline and surfaces a named subsystem on timeout.
* fix(shutdown): bound autoRelayFeeder OnStop on ctx
OnStop waited on the feeder goroutine via <-done without honoring
the shutdown ctx. The goroutine itself selects on ctx in every
loop case, so cancel() normally suffices, but a stuck downstream
dht.WAN.GetClosestPeers that ignored its ctx could block fx.Stop
indefinitely. Adding the ctx.Done() select case mirrors the
reprovideAlert pattern in provider.go and lets the shutdown
deadline reclaim control even with a misbehaving DHT.
* docs(changelog): merge shutdown entries into one user-facing section
Combine the pinner-on-shutdown paragraph with the bounded-shutdown
section under a single "Reliable shutdown and container health checks"
heading. Lead with the visible symptoms (half-shutdown daemons,
healthy-but-dead container reports, manual docker restart) instead of
fx OnStop jargon. Frame Internal.ShutdownTimeout as a
belt-and-suspenders ceiling, with the 12-hour default sized against
the 22-hour DHT provider record expiration.
…11321) * feat(provide): add ipfs provide once for ad-hoc announcements Adds an experimental subcommand that submits provider records for the given CIDs through the provider system right away, without waiting for the next reprovide cycle. Use -r to walk the DAG and announce every reachable block. Designed against the sweep provider (the default since v0.39): StartProviding queues to the burst-provide workers, which publish records to the DHT efficiently. Works with the legacy provider too, though it queues into the slower serial worker pool. CIDs must already exist in the local blockstore. Re-announcement on the regular schedule is governed by Provide.Strategy and Provide.DHT.Interval; this command does not change either. * refactor(routing): deprecate ipfs routing provide Marks `ipfs routing provide` as deprecated and points users at the new `ipfs provide once`. The command keeps its existing Run, Encoders, and flags so existing scripts continue to work; only the status flag and helptext change. * docs(routing): clarify when ipfs routing reprovide applies Tightens the helptext and the sweep-mode error message so the constraint is obvious: this command only triggers a cycle on the legacy provider, and points users at 'ipfs provide stat --all' for monitoring the default sweep schedule. * docs: tighten provide helptext and update routing-provide references Updates docs/config.md and docs/experimental-features.md to reference 'ipfs provide once' instead of 'ipfs routing provide'. Tightens the helptext for 'ipfs provide clear' and the 'ipfs provide stat' overview: drops headings around short paragraphs, prefers active voice, and notes that the sweep provider is the default. * docs: changelog entry for ipfs provide once * docs: use 'provide system' wording consistently * test(provide): cover --recursive and multi-CID paths for provide once Adds two subtests under runProviderSuite (run for both Legacy and Sweep): - --recursive walks the DAG and announces every chunk of a 2 MiB file added with --pin=false under Provide.Strategy=roots, so the auto- provide path stays out of the way. - multiple CIDs in a single invocation succeed and the text encoder reports 'queued 3 CID(s) for immediate provide'. * feat(provide): stream cids and per-cid output for ipfs provide once Each CID flows through the command independently, so stdin can be piped without buffering and consumers see results as they happen. - Run reads CIDs from argv and then from BodyArgs (stdin scanner) one at a time, calling StartProviding per CID. - With -r, the dag.Walk visit callback emits per visited block; the walk cancels its context on the first announce error to stop fetching. - A typed ProvideOnceEvent (one per queued CID) replaces the prior batch result. JSON output streams {"Queued":"<cid>"} per line. - Text output via PostRun: when stderr is a tty, the running count is redrawn on a single line; otherwise a final count is printed. The text encoder still works for HTTP/RPC consumers (one CID per line). - Adds tests for stdin streaming and --enc=json one-event-per-line. * feat(provide): dedupe across all roots and recursive walks Previously the cid set was scoped per root, so a CID shared by two arguments or by two recursive DAG walks was announced twice. Move the set out to the Run scope so each unique CID is announced exactly once per invocation, regardless of how many times it shows up in argv, stdin, or the DAG walks. For -r, hitting an already-seen CID also stops descent into that subtree, avoiding redundant block fetches when DAGs overlap. * style(provide): rename useTTY to isTTY in PostRun * refactor(provide): align ipfs provide once with kubo cmds-lib idioms - Use the existing argumentIterator helper from cid.go to read argv followed by stdin, replacing the inlined two-loop variant. - Document why PostRun forks on encoder type (TTY redraw needs to bypass the encoder; json/xml must keep streaming through it). - Log an ERROR for unexpected response types instead of dropping them silently, mirroring the defensive pattern in cat.go's PostRun. * docs(routing): document streaming limitations of routing provide Spell out what 'ipfs routing provide' does worse than 'ipfs provide once' so users on the deprecation path know why to switch: input buffering, no per-cid output, no dedup across recursive roots, and the sync dht lookup that defeats sweep batching. * docs(changelog): rewrite ipfs provide once entry around user impact Recasts the highlight to lead with what the user can now do, not what the code does internally. Adds a one-line example showing the streaming stdin path that the previous version did not surface, and replaces "namespace" plumbing language with the actual capabilities (running count, json-per-line, single announcement per shared block under -r). * feat(provide): use boxo BloomTracker for cross-input dedup Swaps the cid.Set used by 'ipfs provide once' for the autoscaling boxo BloomTracker, the same dedup mechanism that powers Provide.Strategy=+unique. Run executes on the daemon, not the cli, so this caps daemon memory under hostile or accidental input: a user piping 100M cids previously would have grown the daemon's set to ~7 gb of resident memory; with the bloom chain it plateaus around 700 mb at the default fp rate, and under 100 mb up to 10m unique cids. The trade-off is a small false-positive rate (~1 in 4.75m, the kubo default) that can cause an occasional cid to be silently skipped. For ad-hoc providing this is acceptable; the regular reprovide cycle will pick up anything matched by Provide.Strategy on the next pass. * docs(changelog): use ipfs refs as the provide once example * style(provide): goimports import order * docs(provide): soften dedup wording, comment re.Emit gate, cover Provide.Enabled=false - Change "exactly once per invocation" to acknowledge the bloom false-positive rate now that the dedup is probabilistic. - Add a comment to the text branch of PostRun warning future readers not to call re.Emit there, since the encoder would race with the TTY counter. - Add a runProviderSuite subtest that exercises Provide.Enabled=false through the new code path (the existing routing-provide test only covers the deprecated alias's Run). * docs(changelog): clarify provide once use case and add second example - Note that provide once is also for fine-tuned control over which CIDs get announced when, alongside the regular reprovide schedule. - Add a second example using ipfs pin ls so users see the pattern for replaying their pinset alongside the dag-walk pattern. * feat(provide): error on ipfs provide once with Provide.DHT.Interval=0 When Provide.DHT.Interval=0, kubo wires NoopProvider via OnlineProviders -> OfflineProviders, so StartProviding silently no-ops and the cid never gets announced. provide once was returning success without any DHT publish: a footgun. Add an explicit precondition check that mirrors the routing reprovide error path. Decoupling the wiring so ad-hoc provide works under Interval=0 is tracked separately. * chore(deps): pin go-libp2p-kad-dht to PR #1246 head Pulls in the WithReprovideInterval(0) burst-only mode from libp2p/go-libp2p-kad-dht#1246 so the kubo side of the Provide.DHT.Interval=0 decoupling can be developed against it. * chore(deps): re-pin go-libp2p-kad-dht to PR #1246 head Updates to the latest commit on the upstream branch (817031b) which also relaxes the dual SweepingProvider's reprovide-interval validator to accept 0, on top of the single-provider relaxation in the previous pseudo-version. * feat(provide): decouple Provide.DHT.Interval=0 from the master kill-switch Provide.Enabled is now the only switch that fully turns off the provide system. Provide.DHT.Interval=0 disables only the periodic reprovide schedule; new CIDs still announce via fast-provide-root and 'ipfs provide once'. - groups.go: drop the Interval=0 factor from isProviderEnabled. The real provider (sweep or legacy) is now wired even when Interval=0. - provider.go: skip the keystore sync goroutine in no-schedule mode. The ticker would panic on a zero interval, and with no schedule the keystore has no reader. - cmdenv/env.go: drop the fast-provide-root short-circuit on Interval=0. Provide.Enabled=false is now the only short-circuit. - commands/provide.go: drop the temporary 'cannot provide: Provide.DHT.Interval is 0' error from 'ipfs provide once'. - test/cli: replace the 'Reprovide.Interval=0 disables announcement of new CID too' test (premise is now false) with one asserting that Interval=0 + Enabled=true keeps announcing. Convert the provide-once + Interval=0 test from error path to success path. Tighten the legacy 'Manual Reprovide trigger' test to focus on the error contract. Requires upstream go-libp2p-kad-dht support for WithReprovideInterval(0) (kept under PR #1246). * feat(config): require explicit Provide.Enabled when Provide.DHT.Interval=0 Provide.DHT.Interval=0 used to disable the entire provide system as a side effect. After the decoupling it disables only the periodic reprovide schedule, while new CIDs still announce via fast-provide-root and 'ipfs provide once'. To prevent silent semantic drift on upgrade, the daemon now refuses to start when Interval is explicitly set to 0 unless Provide.Enabled is also set explicitly: - Provide.Enabled=false fully disables providing (the old behaviour). - Provide.Enabled=true keeps ad-hoc providing while skipping the periodic reprovide schedule. The error message names both options so operators can pick the one that matches their intent without reading the changelog. * docs: explain new Provide.DHT.Interval=0 semantic Updates docs/config.md and the v0.42 changelog: Interval=0 now disables only the periodic reprovide schedule, and the daemon refuses to start without an explicit Provide.Enabled in that configuration. Calls out both upgrade paths (Provide.Enabled=false to fully disable, or =true to keep ad-hoc providing). * chore(deps): re-pin go-libp2p-kad-dht to amended PR #1246 head Picks up the timeOffset/timeBetween zero-guards so SweepingProvider.Stats() no longer panics with reprovideInterval=0. Required for 'ipfs provide stat' to work in no-schedule mode. * test(provide): align test expectations with new no-schedule semantic - core/commands/commands_test.go: register /provide/once in the expected command list. - test/cli/provide_stats_test.go: 'ipfs provide stat' with Provide.DHT.Interval=0 now returns valid stats (with the schedule timing fields zeroed) instead of erroring out. Update the assertion to match. * chore(deps): re-pin go-libp2p-kad-dht to amended PR #1246 head Picks up the scheduleEnabled() consistency cleanup so timeOffset and timeBetween match the rest of the upstream gates. * chore(deps): re-pin go-libp2p-kad-dht to PR #1246 merge on master picks up the three follow-up commits guillaumemichel pushed before merging libp2p/go-libp2p-kad-dht#1246: - refactor: simplify StartProvide() - refactor: minimize change diff - fix: don't remove from keystore on StopProviding() * fix(provide): use ProvideOnce in `ipfs provide once` `ipfs provide once` was calling StartProviding, which in sweep mode persists keys to the keystore and adds them to the periodic reprovide schedule. that contradicts the command's name and help text. switch to ProvideOnce so the command publishes once and leaves the schedule untouched. for the legacy provider StartProviding already wraps ProvideOnce, so legacy behaviour is unchanged. also tighten the help text to state plainly that the schedule is not modified. * fix(provider): keep keystore inert when Provide.DHT.Interval=0 In no-schedule mode the keystore has no reader (no reprovide loop) and no writer (kad-dht's burst path skips Put/Delete). Until now we still opened on-disk leveldb/pebble files for it: wasted disk and noise on upgrade/downgrade. Switch the keystore to an in-memory map in no-schedule mode and make destroyDs a no-op. Also purge any pre-existing keystore directory once at startup so users who toggle from schedule to no-schedule reclaim disk. Replace the literal `reprovideInterval == 0` check at the second call site with the named noScheduleMode flag for consistency.
Update boxo to ipfs/boxo#1128 which removes io.Seeker from the files.File interface. Callers that need seeking now type-assert to io.Seeker. - core/commands/cat: type-assert before seeking - core/coreiface/tests: type-assert before seeking
## Problem On a repo from `go-ipfs` or Kubo older than v0.27, the one-time migration uses `http.DefaultClient` (no timeouts) against a single hardcoded `trustless-gateway.link`. If that gateway is slow or blocked, the daemon hangs indefinitely before the data store opens, with no fallback. Reported in ipfs/ipfs-desktop#3147, where a user with a v11 repo thought they had lost 4,444 added images. ## Fix - HTTP client gets dial, TLS, and response-header timeouts (15s, 15s, and boxo's `DefaultRetrievalTimeout` of 30s). - The `"HTTPS"` alias in `Migration.DownloadSources` expands to five trustless community gateways instead of one. Trust is in local per-block multihash verification, not the operator. - Outbound requests send `?format=car` (or `?format=ipns-record`) alongside `Accept`, since some gateways honor only one. - `MultiFetcher` gets a session-scoped quarantine: a failing fetcher moves to the back of the rotation; after three full failed loops it latches `ErrMultiFetcherExhausted` pointing the user at `Migration.DownloadSources`. A cancelled context exits the loop early so it never poisons the quarantine. - `RetryFetcher` is removed; rotation across distinct gateways replaces same-gateway retries. Also fixes two pre-existing bugs in the same path: `NewHttpFetcher` ignored the `userAgent` argument so every request shipped Go's default `Go-http-client/1.1`, and `resolveIPNS` leaked the response body. The `Migration` config and `"HTTPS"` alias keep working the same way for users; the alias just expands to more gateways internally. Closes #7933 Closes #3137 Closes #8911 Closes ipfs/ipfs-desktop#3147
Surfaced by ipfs/service-worker-gateway#1067, where operators behind a default-deny firewall hit unreachable nodes from browser peers because UDP/4001 (QUIC, WebTransport, WebRTC-Direct) was not opened alongside TCP/4001. - new docs/production/firewall.md: inspect ufw rules, open 4001/tcp and 4001/udp, optional Kubo application profile, custom-port and rule-removal notes - daemon health (ipfs diag healthy) split from reachability (ipfs swarm addrs autonat), with Swarm.DisableNatPortMap and Swarm.EnableHolePunching pointers for nodes that stay Private - link the walkthrough from Addresses.Swarm and the Security section in docs/config.md, and from the Production index in docs/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.