Add InferenceCluster.spec.storage.rwxCache + caching design doc#82
Closed
dennis-upbound wants to merge 1 commit into
Closed
Add InferenceCluster.spec.storage.rwxCache + caching design doc#82dennis-upbound wants to merge 1 commit into
dennis-upbound wants to merge 1 commit into
Conversation
cb92ca7 to
58b3a20
Compare
58b3a20 to
46f0f17
Compare
Declares ReadWriteMany storage capability per cluster so Modelplane's composition function can auto-provision a managed cache PVC for multi-node deployments without exposing storage configuration to ML teams. API surface only — composition logic follows in a subsequent MR. - apis/inferenceclusters/definition.yaml — spec.storage.rwxCache with storageClassName and defaultSizeGiB - examples/platform/inference-cluster-gke.yaml — example storage block for a Modelplane-provisioned GKE cluster - examples/platform/inference-cluster-existing.yaml — example storage block for a BYO cluster where the admin provisions the SC - design/caching/README.md — design proposal: what's now, what's future, rationale (separation of platform vs ML team concerns), and why the v0.1 hardcoded choices are OK
46f0f17 to
ab8dad1
Compare
Collaborator
Author
|
Closing in favor of consolidated cache design — superseded by the new branch combining the final API + design doc. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Declares ReadWriteMany storage capability per cluster on InferenceCluster, so Modelplane's composition function can auto-provision a managed cache PVC for multi-node deployments without exposing storage configuration to ML teams.
API surface + design doc only. Composition logic follows in a subsequent MR.
What this changes
apis/inferenceclusters/definition.yaml— addsspec.storage.rwxCache.{storageClassName, defaultSizeGiB}. Platform-team-owned.examples/platform/inference-cluster-gke.yaml— example storage block for a Modelplane-provisioned GKE cluster.examples/platform/inference-cluster-existing.yaml— example storage block for a BYO cluster.design/caching/README.md— 1-page design doc: what's now, what's future (CacheClass,ModelCache,cacheRef), rationale (platform/ML separation), why the v0.1 hardcoded choices are OK.How it behaves
rwxCache. Fail-fast if not declared.ML teams have no caching surface in v0.1. Single-node cold-start optimization and BYO storage are deferred to the future
ModelCache/cacheRefshape.What this does NOT change
ModelDeployment/ModelReplicaschema changes. ML teams see exactly the same surface they see today.CacheClassandModelCacheare deferred to future work.