Add InferenceCluster.spec.storage.rwxCache + caching design doc by dennis-upbound · Pull Request #82 · modelplaneai/modelplane

dennis-upbound · 2026-05-20T22:34:00Z

Declares ReadWriteMany storage capability per cluster on InferenceCluster, so Modelplane's composition function can auto-provision a managed cache PVC for multi-node deployments without exposing storage configuration to ML teams.

API surface + design doc only. Composition logic follows in a subsequent MR.

What this changes

apis/inferenceclusters/definition.yaml — adds spec.storage.rwxCache.{storageClassName, defaultSizeGiB}. Platform-team-owned.
examples/platform/inference-cluster-gke.yaml — example storage block for a Modelplane-provisioned GKE cluster.
examples/platform/inference-cluster-existing.yaml — example storage block for a BYO cluster.
design/caching/README.md — 1-page design doc: what's now, what's future (CacheClass, ModelCache, cacheRef), rationale (platform/ML separation), why the v0.1 hardcoded choices are OK.

How it behaves

Topology	Behavior
Single-node	Ephemeral fetch in engine container.
Multi-node	Auto-provision PVC from cluster `rwxCache`. Fail-fast if not declared.

ML teams have no caching surface in v0.1. Single-node cold-start optimization and BYO storage are deferred to the future ModelCache / cacheRef shape.

What this does NOT change

No ModelDeployment / ModelReplica schema changes. ML teams see exactly the same surface they see today.
No new CRDs. CacheClass and ModelCache are deferred to future work.
No composition function logic. This PR is API + design only.

Declares ReadWriteMany storage capability per cluster so Modelplane's composition function can auto-provision a managed cache PVC for multi-node deployments without exposing storage configuration to ML teams. API surface only — composition logic follows in a subsequent MR. - apis/inferenceclusters/definition.yaml — spec.storage.rwxCache with storageClassName and defaultSizeGiB - examples/platform/inference-cluster-gke.yaml — example storage block for a Modelplane-provisioned GKE cluster - examples/platform/inference-cluster-existing.yaml — example storage block for a BYO cluster where the admin provisions the SC - design/caching/README.md — design proposal: what's now, what's future, rationale (separation of platform vs ML team concerns), and why the v0.1 hardcoded choices are OK

dennis-upbound · 2026-05-21T15:37:31Z

Closing in favor of consolidated cache design — superseded by the new branch combining the final API + design doc.

dennis-upbound force-pushed the dennis/cluster-storage branch 4 times, most recently from cb92ca7 to 58b3a20 Compare May 20, 2026 23:43

dennis-upbound mentioned this pull request May 20, 2026

Add ModelCache v1alpha1 API #80

Closed

dennis-upbound force-pushed the dennis/cluster-storage branch from 58b3a20 to 46f0f17 Compare May 20, 2026 23:51

dennis-upbound force-pushed the dennis/cluster-storage branch from 46f0f17 to ab8dad1 Compare May 21, 2026 05:15

dennis-upbound closed this May 21, 2026

dennis-upbound deleted the dennis/cluster-storage branch June 19, 2026 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add InferenceCluster.spec.storage.rwxCache + caching design doc#82

Add InferenceCluster.spec.storage.rwxCache + caching design doc#82
dennis-upbound wants to merge 1 commit into
mainfrom
dennis/cluster-storage

dennis-upbound commented May 20, 2026 •

edited

Loading

Uh oh!

dennis-upbound commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dennis-upbound commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this changes

How it behaves

What this does NOT change

Uh oh!

dennis-upbound commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dennis-upbound commented May 20, 2026 •

edited

Loading