Skip to content

Harden AMD SEV-SNP KDS collateral fetch (async client, timeouts, caching) #746

Description

@h4x3rotab

Follow-up from #713. Not a trust-model gap — verification is fail-closed (KDS throttling denies release, never forges one) — but it's an availability foot-gun, especially now that --platform auto can land on SNP on AMD hosts.

What's wrong

sev-snp-qvl/src/lib.rs fetches AMD KDS collateral (cert chain + VCEK) with:

  • reqwest::blocking::Client::new() per request (lib.rs:374, lib.rs:395) — a fresh client every call, from inside an async verification path.
  • no request timeout — a hung or throttling KDS (HTTP 429 is documented on lab hosts) stalls verification with no bound.
  • no caching — every verification re-fetches the same per-product cert chain and per-(chip_id, TCB) VCEK.

What to do

  • use an async HTTP client (or run the blocking fetch on a dedicated pool), reusing one client.
  • set explicit connect + request timeouts.
  • cache collateral by (product, chip_id, reported_tcb); cert chains are per-product and long-lived, VCEKs are stable per (chip, TCB).
  • keep collateral validation fail-closed; the pinned ARK (builtin_ark()) stays the trust root regardless of what KDS returns.

The DSTACK_AMD_KDS_PROXY_URL / core.sev_snp.amd_kds_proxy_url mirror path already exists for throttled labs; this issue is about making the default path robust, not about the proxy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsecuritySecurity-related issue, report, or hardening worksecurity: hardeningSecurity defense-in-depth or hardening work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions