Skip to content

Add ModelCache v1alpha1 API#80

Closed
dennis-upbound wants to merge 1 commit into
dennis/cluster-storagefrom
dennis/modelcache-api
Closed

Add ModelCache v1alpha1 API#80
dennis-upbound wants to merge 1 commit into
dennis/cluster-storagefrom
dennis/modelcache-api

Conversation

@dennis-upbound

@dennis-upbound dennis-upbound commented May 20, 2026

Copy link
Copy Markdown
Collaborator

Follow-up to #82. Builds on the cluster-level spec.storage.rwxCache shape.

Adds the ML-team-facing opt-in for cross-deployment sharing, independent cache lifecycle, and proactive pre-staging on top of the invisible per-replica caching Modelplane already provides for multi-node deployments.

API surface + docs + examples only. Composition function follows in a separate MR.

What this adds

  • apis/modelcaches/definition.yamlModelCache XRD. Artifact source + mount path + optional size override + cluster selector. Storage class is always inherited from the target cluster's rwxCache; ML teams don't pick it.
  • apis/modeldeployments/definition.yamlspec.modelCacheRef (singular, matching the existing inferenceClusterRef pattern).
  • apis/modelreplicas/definition.yaml — same spec.modelCacheRef, inherited from the parent ModelDeployment.
  • docs/concepts.md## ModelCache section positioned as the optional shared/lifecycled overlay on top of the cluster-level invisible caching; mermaid diagram updated.
  • examples/cache/model-cache-qwen.yaml — Qwen 0.5B cache example.
  • examples/deployment/model-deployment-cached.yaml — deployment referencing the cache via spec.modelCacheRef.

How this relates to #82

Scenario #82 (this PR's base) This PR adds
Multi-node, no cache reference Auto-provision per-replica PVC from cluster rwxCache (no change)
Multi-node + modelCacheRef set n/a Mount shared cache PVC across replicas; lifecycle decoupled from deployment
Single-node + modelCacheRef set n/a Mount shared cache PVC; cold-start optimization
Single-node, no cache reference Ephemeral fetch in engine container (no change)

ML teams who don't care about sharing or pre-staging never see this surface — the cluster default still handles multi-node invisibly.

What this does NOT change

  • examples/deployment/model-deployment.yaml (the canonical single-node example) — untouched.
  • docs/getting-started.md — untouched. ModelCache stays an opt-in feature; the basic flow doesn't use it.
  • No composition function logic.

@dennis-upbound dennis-upbound force-pushed the dennis/modelcache-api branch 5 times, most recently from e8c73b0 to c76fb0c Compare May 20, 2026 23:49
@dennis-upbound dennis-upbound changed the base branch from main to dennis/cluster-storage May 20, 2026 23:49
@dennis-upbound dennis-upbound force-pushed the dennis/cluster-storage branch from 58b3a20 to 46f0f17 Compare May 20, 2026 23:51
@dennis-upbound dennis-upbound force-pushed the dennis/modelcache-api branch from c76fb0c to 5a177a4 Compare May 20, 2026 23:52
@dennis-upbound dennis-upbound force-pushed the dennis/cluster-storage branch from 46f0f17 to ab8dad1 Compare May 21, 2026 05:15
… caching)

Builds on the cluster-level rwxCache shape: ModelCache is the
ML-team-facing opt-in for cross-deployment sharing, independent
lifecycle, and proactive pre-staging on top of the invisible
per-replica caching that Modelplane already provides for multi-node
deployments.

API surface only — composition function follows in a separate MR.

- apis/modelcaches/definition.yaml — XRD. Artifact source +
  mount path + optional size override + clusterSelector. Storage
  class is always inherited from the target cluster's rwxCache;
  ML teams don't pick it.
- apis/modeldeployments/definition.yaml — spec.modelCacheRef
  (singular, matching the existing inferenceClusterRef pattern)
- apis/modelreplicas/definition.yaml — spec.modelCacheRef,
  inherited verbatim from the parent ModelDeployment
- docs/concepts.md — ModelCache section positioned as the
  optional shared/lifecycled overlay on the cluster-level
  invisible caching
- examples/cache/model-cache-qwen.yaml — Qwen 0.5B cache example
- examples/deployment/model-deployment-cached.yaml — deployment
  referencing the cache via spec.modelCacheRef
@dennis-upbound dennis-upbound force-pushed the dennis/modelcache-api branch from 5a177a4 to 573b410 Compare May 21, 2026 05:16
@dennis-upbound

Copy link
Copy Markdown
Collaborator Author

Closing in favor of consolidated cache design — superseded by the new branch combining the final API + design doc.

@dennis-upbound dennis-upbound deleted the dennis/modelcache-api branch June 19, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant