Add ModelCache v1alpha1 API#80
Closed
dennis-upbound wants to merge 1 commit into
Closed
Conversation
e8c73b0 to
c76fb0c
Compare
58b3a20 to
46f0f17
Compare
c76fb0c to
5a177a4
Compare
46f0f17 to
ab8dad1
Compare
… caching) Builds on the cluster-level rwxCache shape: ModelCache is the ML-team-facing opt-in for cross-deployment sharing, independent lifecycle, and proactive pre-staging on top of the invisible per-replica caching that Modelplane already provides for multi-node deployments. API surface only — composition function follows in a separate MR. - apis/modelcaches/definition.yaml — XRD. Artifact source + mount path + optional size override + clusterSelector. Storage class is always inherited from the target cluster's rwxCache; ML teams don't pick it. - apis/modeldeployments/definition.yaml — spec.modelCacheRef (singular, matching the existing inferenceClusterRef pattern) - apis/modelreplicas/definition.yaml — spec.modelCacheRef, inherited verbatim from the parent ModelDeployment - docs/concepts.md — ModelCache section positioned as the optional shared/lifecycled overlay on the cluster-level invisible caching - examples/cache/model-cache-qwen.yaml — Qwen 0.5B cache example - examples/deployment/model-deployment-cached.yaml — deployment referencing the cache via spec.modelCacheRef
5a177a4 to
573b410
Compare
Collaborator
Author
|
Closing in favor of consolidated cache design — superseded by the new branch combining the final API + design doc. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #82. Builds on the cluster-level
spec.storage.rwxCacheshape.Adds the ML-team-facing opt-in for cross-deployment sharing, independent cache lifecycle, and proactive pre-staging on top of the invisible per-replica caching Modelplane already provides for multi-node deployments.
API surface + docs + examples only. Composition function follows in a separate MR.
What this adds
apis/modelcaches/definition.yaml—ModelCacheXRD. Artifact source + mount path + optional size override + cluster selector. Storage class is always inherited from the target cluster'srwxCache; ML teams don't pick it.apis/modeldeployments/definition.yaml—spec.modelCacheRef(singular, matching the existinginferenceClusterRefpattern).apis/modelreplicas/definition.yaml— samespec.modelCacheRef, inherited from the parent ModelDeployment.docs/concepts.md—## ModelCachesection positioned as the optional shared/lifecycled overlay on top of the cluster-level invisible caching; mermaid diagram updated.examples/cache/model-cache-qwen.yaml— Qwen 0.5B cache example.examples/deployment/model-deployment-cached.yaml— deployment referencing the cache viaspec.modelCacheRef.How this relates to #82
rwxCachemodelCacheRefsetmodelCacheRefsetML teams who don't care about sharing or pre-staging never see this surface — the cluster default still handles multi-node invisibly.
What this does NOT change
examples/deployment/model-deployment.yaml(the canonical single-node example) — untouched.docs/getting-started.md— untouched. ModelCache stays an opt-in feature; the basic flow doesn't use it.