Skip to content

Add InferenceCluster.spec.storage.rwxCache + caching design doc#82

Closed
dennis-upbound wants to merge 1 commit into
mainfrom
dennis/cluster-storage
Closed

Add InferenceCluster.spec.storage.rwxCache + caching design doc#82
dennis-upbound wants to merge 1 commit into
mainfrom
dennis/cluster-storage

Conversation

@dennis-upbound

@dennis-upbound dennis-upbound commented May 20, 2026

Copy link
Copy Markdown
Collaborator

Declares ReadWriteMany storage capability per cluster on InferenceCluster, so Modelplane's composition function can auto-provision a managed cache PVC for multi-node deployments without exposing storage configuration to ML teams.

API surface + design doc only. Composition logic follows in a subsequent MR.

What this changes

  • apis/inferenceclusters/definition.yaml — adds spec.storage.rwxCache.{storageClassName, defaultSizeGiB}. Platform-team-owned.
  • examples/platform/inference-cluster-gke.yaml — example storage block for a Modelplane-provisioned GKE cluster.
  • examples/platform/inference-cluster-existing.yaml — example storage block for a BYO cluster.
  • design/caching/README.md — 1-page design doc: what's now, what's future (CacheClass, ModelCache, cacheRef), rationale (platform/ML separation), why the v0.1 hardcoded choices are OK.

How it behaves

Topology Behavior
Single-node Ephemeral fetch in engine container.
Multi-node Auto-provision PVC from cluster rwxCache. Fail-fast if not declared.

ML teams have no caching surface in v0.1. Single-node cold-start optimization and BYO storage are deferred to the future ModelCache / cacheRef shape.

What this does NOT change

  • No ModelDeployment / ModelReplica schema changes. ML teams see exactly the same surface they see today.
  • No new CRDs. CacheClass and ModelCache are deferred to future work.
  • No composition function logic. This PR is API + design only.

@dennis-upbound dennis-upbound force-pushed the dennis/cluster-storage branch 4 times, most recently from cb92ca7 to 58b3a20 Compare May 20, 2026 23:43
@dennis-upbound dennis-upbound force-pushed the dennis/cluster-storage branch from 58b3a20 to 46f0f17 Compare May 20, 2026 23:51
Declares ReadWriteMany storage capability per cluster so Modelplane's
composition function can auto-provision a managed cache PVC for
multi-node deployments without exposing storage configuration to ML
teams.

API surface only — composition logic follows in a subsequent MR.

- apis/inferenceclusters/definition.yaml — spec.storage.rwxCache
  with storageClassName and defaultSizeGiB
- examples/platform/inference-cluster-gke.yaml — example storage block
  for a Modelplane-provisioned GKE cluster
- examples/platform/inference-cluster-existing.yaml — example storage
  block for a BYO cluster where the admin provisions the SC
- design/caching/README.md — design proposal: what's now, what's
  future, rationale (separation of platform vs ML team concerns), and
  why the v0.1 hardcoded choices are OK
@dennis-upbound dennis-upbound force-pushed the dennis/cluster-storage branch from 46f0f17 to ab8dad1 Compare May 21, 2026 05:15
@dennis-upbound

Copy link
Copy Markdown
Collaborator Author

Closing in favor of consolidated cache design — superseded by the new branch combining the final API + design doc.

@dennis-upbound dennis-upbound deleted the dennis/cluster-storage branch June 19, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant