Implement the updated API shape by negz · Pull Request #75 · modelplaneai/modelplane

negz · 2026-05-13T00:19:09Z

This PR implements most of the API shape we aligned on in #64.

The exceptions are:

No CEL (no spec.nodeSelector)
No DRA support - we don't set pod requirements at all, except number of GPUs needed
ModelEndpoint doesn't support specifying addresses as FQDNs - only IPs
Prefill/decode (no spec.prefill)
We're still built on KServe only

The team decided to focus on the kubectl / YAML workflow for the initial demo. The web UI (Go proxy + React SPA) is not needed right now and adds build complexity (Go, Node.js, npm, Vite, container image). This commit removes ui/ and all Nix infrastructure that supported it: the frontend and proxy build derivations, the container image builder, the Go and frontend CI checks, the dev-proxy / dev-frontend / load-image apps, and Go / Node.js from the dev shell. The lint app now runs ruff on Python composition functions instead of golangci-lint on Go. Signed-off-by: Nic Cope <nicc@rk0n.org>

… ModelReplica The PR #64 design replaces InferenceEnvironment with InferenceCluster and ModelPlacement with ModelReplica. The new vocabulary lines up with how the rest of the design is shifting: ModelDeployment fans out to ModelReplicas (one per cluster), the way a Kubernetes Deployment fans out to Pods. "Environment" was always overloaded - it meant both "GPU cluster" and "organizational stage" - so dropping it in favour of "cluster" tightens the model. This commit applies both renames mechanically: - apis/inferenceenvironments -> apis/inferenceclusters (kind, plural, short name ic) - apis/modelplacements -> apis/modelreplicas (kind, plural, short name mr) - functions/compose-inference-env -> compose-inference-cluster - functions/compose-model-placement -> compose-model-replica - tests/test-inference-env{,-existing} -> tests/test-inference-cluster{,-existing} - tests/test-model-placement{,-autoscaling,-multinode} -> tests/test-model-replica{,-autoscaling,-multinode} - tests/test-model-deployment-incompatible-env -> tests/test-model-deployment-incompatible-cluster - examples/platform/inference-environment-*.yaml -> inference-cluster-*.yaml Field renames that follow from the kind renames: - ModelReplica.spec.inferenceEnvironmentRef -> inferenceClusterRef - ModelDeployment.spec.environmentSelector -> clusterSelector - ModelDeployment.spec.environments -> clusters - ModelDeployment.status.placements -> replicas - ModelDeployment printer column ENVS -> CLUSTERS - ModelReplica printer column ENVIRONMENT -> CLUSTER Label renames: - modelplane.ai/environment -> modelplane.ai/cluster - modelplane.ai/placement -> modelplane.ai/replica Condition type / reason renames track the same vocabulary (e.g. PlacementsScheduled -> ReplicasScheduled, NoEnvironments -> NoClusters). The composed resource keys in compose-model-deployment also move from "placement-<name>" to "replica-<name>". The ClusterModel/Model serving[].environmentSelector field is left alone here - those resources are being removed entirely in the next commit. Signed-off-by: Nic Cope <nicc@rk0n.org>

The catalog split between ClusterModel/Model and ModelDeployment didn't hold up in practice. Engine args are inherently model-specific (the model name lives in --model=...), different quantization variants reference different weight checkpoints, and the "platform team curates a catalog" responsibility is real ongoing engineering work that most organizations don't have a team for. This commit folds engine and topology config inline on ModelDeployment, following the PR #64 design. ClusterModel and Model are removed entirely along with their composition function (compose-model). Source / huggingFace blocks are gone too - engines fetch their own weights via their native --model=<repo> argument. Auth for private repos goes through standard PodSpec env (HF_TOKEN, NGC_API_KEY). The new ModelDeployment spec: - spec.replicas How many ModelReplicas to fan out to. - spec.clusterSelector Label selector against InferenceCluster. - spec.workers Compute shape of one worker: .count (default 1) .topology.strategy Tensor | TensorPipeline .topology.tensor GPUs per node .topology.pipeline Nodes per worker (TensorPipeline only) .resources.cpu Required, no default .resources.memory Required, no default - spec.engine Engine config: .image Container image .args Engine args (opaque) .env PodSpec-shaped env vars (HF_TOKEN etc.) .imagePullSecrets For NGC and similar registries. ModelReplica mirrors this shape minus spec.replicas and plus spec.inferenceClusterRef. DataExpert topology, disaggregated prefill/decode (spec.prefill), and real CEL evaluation on nodeSelector are deferred. The scheduler reads the topology shape directly (no more VRAM math): a pool fits when its countPerNode >= topology.tensor and its nodes >= topology.pipeline. Autoscaling drops out. The XRD declares the standard /scale subresource so kubectl scale works, but the KEDA ScaledObject and Prometheus query plumbing are removed. KEDA-via-scale-subresource opt-in lands later. The status.model.name field is gone - the model identity now lives in opaque engine args, and best-effort parsing would be brittle. Naming on the remote cluster shifts from "model name sanitized to a DNS label" to just the ModelDeployment name. Each remote cluster gets one LLMInferenceService per deployment with this name, so the control plane HTTPRoute can rewrite to a uniform path on every backend. The compose-model-deployment scheduler still prefers clusters that already have a replica for this deployment (stability). Capacity accounting subtracts GPUs consumed by other deployments' replicas based on each replica's own workers.topology. 13 composition tests pass. Signed-off-by: Nic Cope <nicc@rk0n.org>

InferenceClass is the bridge between hardware capabilities and provisioning recipes. Platform teams author InferenceClass resources describing the shape of a GPU node pool (resources block: GPUs per node, per-GPU memory) and optionally how to provision one on a cloud (provisioning block: machine type, accelerator, disk size). Each InferenceCluster.nodePools[] entry references a class by name and declares only cluster-specific counts (nodeCount, minNodeCount, maxNodeCount, zones). This replaces per-pool inline hardware fields and converges the GKE and BYO (Existing) cluster shapes. The same node pool schema works for both: classes with provisioning describe pools Modelplane creates, classes without provisioning describe pools that already exist on a BYO cluster. The system node pool that hosts control-plane components (Envoy Gateway, KEDA, etc.) is no longer in the user-facing API. The composition function injects it automatically for GKE clusters (e2-standard-4, 1-2 nodes). Users only declare GPU pools. PR #64's design used DRA-shaped attributes and capacity on the class specifically so that ModelDeployment.spec.nodeSelector CEL could evaluate against them. With nodeSelector dropped from this branch and pod-shape moved to workers.resources, the DRA shape adds verbosity without a consumer. spec.resources.gpu carries the count and per-GPU memory the scheduler and composition function actually use. The nvidia.com/gpu device plugin name remains an internal detail of the composition function rather than a user-facing key. The scheduler is untouched: compose-inference-cluster still populates status.capacity.gpuPools[] in the same shape, just sourced from the referenced classes instead of inline pool config. InferenceClass itself has no composed children. compose-inference-class just marks the XR Ready. lib/resource.py now serialises with by_alias=True so the generated class_ alias field renders as "class" in YAML. 14 composition tests pass. Signed-off-by: Nic Cope <nicc@rk0n.org>

PR #64 splits routing apart from deployment so that fan-out (replicas on clusters) and exposure (where requests land) can evolve independently. ModelEndpoint is a reachable inference endpoint; ModelService selects endpoints by label and composes the Gateway-API HTTPRoute that exposes them. This commit introduces both kinds, moves the Envoy Backend composition out of ModelReplica and into ModelEndpoint, and moves the HTTPRoute composition out of ModelDeployment and into ModelService. The pattern mirrors Kubernetes Deployment + Service: applying a ModelDeployment alone gets you running replicas; you author a ModelService to make them reachable. ModelEndpoint (namespaced, short me): carries the informational URL, the api protocol, and the rewritePath that ModelService consumes when composing the URLRewrite filter. compose-model-endpoint parses spec.url, composes an Envoy Backend on the control plane, and surfaces the Backend's name in status.routing.backendName. ModelService (namespaced, short ms): carries spec.endpoints, each a label selector. compose-model-service fetches the InferenceGateway and all matching ModelEndpoints, then composes an HTTPRoute that matches the service's namespace/name path prefix and rewrites to the first matched endpoint's rewritePath, with all matched endpoints as backendRefs (equal weighting; weight as a field is deferred). The service's public address surfaces on status.address. ModelDeployment changes: stops composing the HTTPRoute, composes one ModelEndpoint per matched cluster (labeled modelplane.ai/deployment: <name>, with rewritePath pointing at the remote LLMInferenceService path), and drops status.endpoint.url. The URL surface lives on ModelService now. ModelReplica changes: stops composing the Envoy Backend (that moves to ModelEndpoint) and drops both status.endpoint.url and status.routing.backendName. The replica becomes purely about composing the LLMInferenceService on the remote cluster. External / SaaS endpoint support (fqdn-style Backends) is deferred. spec.url is expected to be an http://<ip>:<port>/... shape today; the schema doesn't enforce that yet. 16 composition tests pass. Signed-off-by: Nic Cope <nicc@rk0n.org>

The Pydantic code generator turns a single-value enum (enum: [OpenAI]) into a Literal with the sole value as its default. The SDK's resource.update uses exclude_defaults=True, which silently drops the field from the serialized resource. The CRD then rejects it because api is required. Nothing reads spec.api today. Drop it rather than work around the generator/SDK interaction. We can reintroduce it later once we sort out exclude_defaults vs exclude_unset with the SDK. Signed-off-by: Nic Cope <nicc@rk0n.org>

KServe v0.16's LLMInferenceService requires a non-empty model.uri. This was previously set explicitly but was lost during the API reshape. Restore it by extracting the model name from the --model= engine arg. This is an interim fix — the plan is to stop using KServe altogether. Signed-off-by: Nic Cope <nicc@rk0n.org>

KServe's LLMInferenceService handles model fetching via model.uri and invokes vLLM with the local model path. Passing --model= as a container arg conflicts — vLLM v0.7.3 rejects it when invoked via `vllm serve`. Extract the model name from --model= to populate model.uri, then strip it from the args passed to the container. Signed-off-by: Nic Cope <nicc@rk0n.org>

Signed-off-by: Nic Cope <nicc@rk0n.org>

ModelDeployment had an ENDPOINT column on status.endpoint.url that's no longer written — the URL surface moved to ModelService when the routing layer was split. Drop the column and the status.endpoint schema block. ModelReplica had the same dead ENDPOINT column plus status.endpoint and status.routing schema blocks. The replica function no longer writes either. Replace the column with STRATEGY (Tensor or TensorPipeline) and drop the unused schema. Add a SOURCE column to InferenceCluster so kubectl get ic shows whether the cluster was provisioned (GKE) or BYO (Existing). Signed-off-by: Nic Cope <nicc@rk0n.org>

Signed-off-by: Nic Cope <nicc@rk0n.org>

Every XR kind showed READY, SYNCED, AGE, and COMPOSITION as the last four columns in kubectl get, plus a duplicate READY (and sometimes AGE) earlier in the row. Crossplane v2 appends those built-in columns to every XR automatically, so the hand-defined ones were always duplicates. This commit removes the duplicate READY and AGE columns from all nine XRDs (six public, three internal). It keeps the columns that aren't built-in: SOURCE and GATEWAY on InferenceCluster, BACKEND and ADDRESS on InferenceGateway, REPLICAS on ModelDeployment, URL on ModelEndpoint, CLUSTER and STRATEGY on ModelReplica, ADDRESS on ModelService, PROJECT and REGION on GKECluster, KSERVE and GATEWAY on KServeBackend. InferenceClass had no non-duplicate columns, so its additionalPrinterColumns block is removed entirely. Signed-off-by: Nic Cope <nicc@rk0n.org>

ModelDeployment composes one ModelReplica and one ModelEndpoint per target InferenceCluster. The MR and ME both carry a modelplane.ai/deployment label so ModelService can select all endpoints for a deployment. They don't carry any label identifying which InferenceCluster they belong to, so a ModelService can't narrow its routing to a specific subset of clusters - the selector either matches every endpoint of the deployment or none. This commit adds a modelplane.ai/cluster label to MR and ME carrying the target cluster's name. A ModelService can now select on both deployment and cluster, e.g. to drop replicas on clusters that aren't serving traffic. Signed-off-by: Nic Cope <nicc@rk0n.org>

The script was using `kubectl run -i --rm` to invoke curl in an ephemeral pod. That mode attaches to the pod's stdout after creation; if the container exits before the attach binds, the curl response is lost. The script printed an empty body roughly half the time, which is unsuitable for a live demo. This commit reworks the script to create the pod with `kubectl run`, wait for it to reach Succeeded, then read its logs. Logs survive after container exit, so there's no race. A trap cleans up the pod on any exit path. The script also now reads the ModelService address from status.address rather than hard-coding the gateway IP and route path, and prints what it's testing before issuing the request. Signed-off-by: Nic Cope <nicc@rk0n.org>

This was left behind while refactoring to the new API design. Signed-off-by: Nic Cope <nicc@rk0n.org>

Signed-off-by: Nic Cope <nicc@rk0n.org>

The fleet scheduler matched clusters based solely on GPU capacity, ignoring whether the cluster was actually ready. A cluster that was still provisioning or hadn't established its gateway could be selected, causing the deployment function to compose ModelEndpoints with placeholder URLs that produced invalid Envoy Backends. This commit adds a readiness gate to the scheduler: clusters must have a Ready=True condition and a gateway address to be schedulable. Since every matched cluster now has a valid gateway address, the endpoint composition no longer needs a fallback path. The redundant gateway_address field on Candidate is removed — the address is read from clusters_by_name when needed. The stale InferenceGateway fixtures are also removed from all deployment tests. The function no longer requires the InferenceGateway as of the previous commit. Signed-off-by: Nic Cope <nicc@rk0n.org>

The ModelDeployment and ModelReplica APIs had a flat spec.engine block and a strategy discriminator on spec.workers.topology that diverged from the design doc. The engine configuration was separate from the worker template, topology required an explicit Tensor/TensorPipeline strategy enum, and CPU/memory resources lived in a top-level workers.resources block that was pre-DRA scaffolding. This commit restructures both XRDs to match the design: - spec.engine moves into spec.workers.template, a curated subset of PodTemplateSpec. The template has metadata (labels, annotations for service mesh injection etc.) and spec (containers, imagePullSecrets). The container named "engine" is the inference engine; additional containers pass through as sidecars. A CEL validation rule on the containers array enforces exactly one container named "engine". - spec.workers.topology.strategy is removed. Multi-node serving is now derived from pipeline > 1 (default 1). The topology axes compose multiplicatively without a discriminator. - spec.workers.resources is removed. It was only used to set CPU and memory limits on the engine container. DRA will handle device binding and resource requirements in a future version. Until then, pods are created with only nvidia.com/gpu in resource limits. Signed-off-by: Nic Cope <nicc@rk0n.org>

ModelService composed a single rule-level URLRewrite filter using the first matched endpoint's rewritePath. When a service selected endpoints with different rewrite targets (e.g. composed replicas rewriting to /default/qwen-demo/ alongside a manual SaaS endpoint rewriting to /v1), all traffic was rewritten to the first path regardless of which backend handled the request. This silently broke routing for any endpoint whose rewritePath differed. Gateway API's HTTPBackendRef supports per-backendRef filters, including URLRewrite (Extended support, confirmed in Envoy Gateway's route processing). This commit moves the URLRewrite filter from the rule level to each individual backendRef, derived from that endpoint's spec.rewritePath. Endpoints with different rewrite targets now coexist correctly in the same ModelService. The EndpointsResolved condition message now also reports how many matched endpoints are still waiting for their Backend to be composed, rather than silently excluding them from the HTTPRoute. Signed-off-by: Nic Cope <nicc@rk0n.org>

The user-facing docs described the previous iteration of the API: ClusterModel/Model catalog entries, InferenceEnvironments, serving profiles, and concurrency-based per-cluster autoscaling. None of those resources or behaviors exist in the current implementation. The README hero snippet used fields (modelRef, clusters) that no XRD defines, and getting-started had a "Register a model" step pointing at a deleted example file. This commit rewrites the three documents to match what Modelplane actually does: - README's snippet uses the current workers.topology and worker template shape. Prose describes the InferenceCluster / InferenceClass split and the ModelDeployment -> ModelReplica -> ModelEndpoint -> ModelService flow. - concepts.md is rewritten around the seven resources that exist today, with a diagram showing how ModelService routes across the endpoints composed per replica. - getting-started.md drops the broken "Register a model" step, adds an InferenceClass step before the cluster, and uses ModelService for routing in the final curl and status checks. Signed-off-by: Nic Cope <nicc@rk0n.org>

Several small issues accumulated during the API reshape. compose-model-replica derived the parent deployment name by stripping the cluster suffix from its own name — a string-parsing contract across function boundaries that would silently break if the naming scheme changed (e.g. truncation on long names). The deployment name is already on the modelplane.ai/deployment label that compose-model-deployment sets on every replica. The function now reads it from there. Test XR YAMLs gain the label to match reality. compose-model-deployment carried a clusters_by_name dict solely so compose_endpoints could look up gateway addresses by cluster name. The scheduler already had the address in hand (it gates on it via _cluster_ready) but didn't surface it on Candidate. Candidate now carries gateway_address, eliminating clusters_by_name entirely. Other cleanup: - CONDITION_REASON_MODEL_STARTING was duplicated across compose-model-deployment and compose-model-replica. Hoisted to lib/conditions.py. - compose-model-replica called _engine_container() twice (once to compose, once for the event message). Cached in self.engine. - compose-inference-cluster used hasattr() to guard an Optional Pydantic field that is None when absent. Removed. - compose-inference-cluster round-tripped backend secrets through dicts then back to typed kssv1alpha1.Secret objects. Callers now construct the typed objects directly. - Hardcoded "http://" in URL construction replaced with a GATEWAY_SCHEME constant in lib/metadata.py. Signed-off-by: Nic Cope <nicc@rk0n.org>

Signed-off-by: Nic Cope <nicc@rk0n.org>

The Python code generator aliases `class` to `class_` with Field(alias='class') because `class` is a Python keyword. The function-sdk-python's resource.update() calls model_dump() without by_alias=True, so any Pydantic model with an aliased field silently serializes under the Python name (class_) instead of the JSON name (class). The CRD expects `class`, so the composed resource would be rejected. This codebase worked around the problem by adding by_alias=True to the three model_dump() calls in lib/resource.py. But that only covers the local helpers — every direct resource.update(rsp.desired.resources[k], model) call goes through the SDK path, which doesn't pass by_alias. The field doesn't bite us today because compose-inference-cluster only reads nodePools[].class from the input XR and never re-emits it. But it's a landmine: the moment anyone composes an InferenceCluster or adds another Python-keyword field to a composed type, it breaks silently. This commit renames the field to className, following the Kubernetes convention (storageClassName, ingressClassName, runtimeClassName). With no aliased fields in our schemas, by_alias=True is removed from lib/resource.py — the SDK's resource.update() works correctly without it. Signed-off-by: Nic Cope <nicc@rk0n.org>

The compose-inference-class function only sets Ready and an Accepted condition — it doesn't compose any resources or write meaningful status. The test only verified the XR spec round-tripped unchanged, which is low value. InferenceClass XRD/composition wiring is still covered transitively by test-inference-cluster, which loads an InferenceClass as a required resource fixture. Signed-off-by: Nic Cope <nicc@rk0n.org>

Composed resource names are built by concatenating user-supplied components (e.g. deployment name + cluster name, or XR name + fixed suffix). When the result exceeds the 63-character DNS label limit, the previous code silently truncated with [:63]. Two distinct inputs that share a long prefix would truncate to the same name, silently colliding. This commit introduces dns_name() in lib/naming.py. Every composed name now carries a 5-character SHA-256 hash suffix, regardless of length. Short names get the suffix too, so all composed names are visually consistent and the naming scheme is uniform. When the name would exceed 63 characters, the prefix is truncated to make room for the hash. Every name-construction site in the codebase now goes through dns_name(). This covers lib/naming.py, compose-inference-cluster, compose-gke-cluster, and compose-kserve-backend. Signed-off-by: Nic Cope <nicc@rk0n.org>

The paragraph explaining that the function does not compose the HTTPRoute reads like an explanation of how things changed rather than what the function does. The first paragraph already covers the function's purpose. Signed-off-by: Nic Cope <nicc@rk0n.org>

Per Google Python style, import modules rather than individual functions. Signed-off-by: Nic Cope <nicc@rk0n.org>

…gke-cluster The system node pool (e2-standard-4 for Envoy Gateway, KEDA, etc.) is a GKE provisioning detail. compose-inference-cluster was prepending it to the GKECluster XR's nodePools before composing the XR, which leaked the implementation detail into the intermediate API surface. This commit moves the system pool constants and injection into compose-gke-cluster, where they belong alongside the other GKE node pool provisioning logic. compose-inference-cluster now passes only GPU pools to the GKECluster XR. compose-gke-cluster injects the system pool when creating the actual GCP node pool resources. Signed-off-by: Nic Cope <nicc@rk0n.org>

When an InferenceCluster with source=GKE had a node pool referencing an InferenceClass without a GKE provisioning block, the function silently skipped the pool. A misconfigured node pool is a user-fixable error, and silent partial provisioning is confusing — the cluster appears ready but is missing GPU capacity. This commit replaces the silent skip with a ClusterReady=False condition and a warning. The function returns early, gating all downstream composition until the user fixes the InferenceClass reference. Signed-off-by: Nic Cope <nicc@rk0n.org>

dennis-upbound

Awesome! Looks great! A few minor comments/questions

Composes a PVC + a one-shot hydration Job per matched InferenceCluster. v0.1 scope: Weights kind, PVC backend, HuggingFace + S3 sources, replication = AllMatchingClusters. ContentAddressed / Custom backends, Tokenizer / Bytes / Adapter / Engine kinds, BYO ExistingPVC, and per-cluster selector overrides are deferred. Out of scope here: ModelDeployment integration. The mount-injection that attaches a cache's PVC to a model serving pod lives in compose-model-replica and is deferred until the new ModelDeployment shape (PR #75) stabilizes. Adds: - apis/modelcaches/{definition,composition}.yaml - functions/compose-model-cache/main.py - examples/cache/model-cache-basic.yaml Design: #76.

PR #75 deleted ui/ entirely. An earlier commit on this branch swept ui/frontend/node_modules into the index via git add -A, so the rebase faithfully re-added ~10k files (~2.5M lines, ~181M) on top of the deletion. Drop them.

The Scaling section claimed the ModelDeployment XRD declares a Kubernetes scale subresource. It does not — Crossplane XRDs do not support the scale subresource until v2.3, which has not shipped yet. Signed-off-by: Nic Cope <nicc@rk0n.org>

The description called spec.url "informational" and said ModelService does not read it. The URL is used to configure routing to the endpoint. Signed-off-by: Nic Cope <nicc@rk0n.org>

The leader template included pod metadata (labels, annotations) from workers.template.metadata, but the multi-node worker template used the raw pod_spec without it. If a user set template metadata for service mesh injection or similar, leader pods got the metadata but worker pods did not. This was not a regression from main — the old compose-model-placement had no pod metadata support at all. But the new code introduced template metadata and then applied it inconsistently. Signed-off-by: Nic Cope <nicc@rk0n.org>

Three bugs in compose-model-replica, all related to how the LLMInferenceService manifest is built. First, spec.replicas was hardcoded to 1. The XRD has workers.count (default 1) meaning "number of workers per replica", and the scheduler correctly accounts for it when reserving GPU capacity, but the composition function ignored it. spec.replicas is now set from workers.count. Second, the multi-node LLMIS shape did not match KServe's v1alpha1 API. The function emitted worker as {size, template} but KServe expects worker to be a PodSpec directly — the LWS group size is derived from parallelism.pipeline, not a separate field. The function also set parallelism.tensor to tensor × pipeline (total GPUs) instead of the actual tensor parallelism per node, and never set parallelism.pipeline at all. Third, pod metadata (labels, annotations) from the worker template was placed inside the PodSpec at template.metadata. KServe's WorkloadSpec.template is a PodSpec, which has no metadata field. The KServe-blessed location for pod labels and annotations is at the WorkloadSpec level (siblings of template), where KServe applies them to both leader and worker pods. The multi-node test now sets workers.count: 2 and asserts the correct LLMIS shape: replicas=2, parallelism with both tensor and pipeline axes, and worker as a bare PodSpec. Signed-off-by: Nic Cope <nicc@rk0n.org>

negz force-pushed the demonstration branch 3 times, most recently from 024f1b5 to aab0da2 Compare May 13, 2026 07:07

This was referenced May 13, 2026

ModelCache v0.1 — PVC backend, multi-node #66

Closed

WIP: ModelCache design doc + examples #76

Closed

v0.1 ModelCache + multi-node LWS unblock #78

Closed

dennis-upbound pushed a commit that referenced this pull request May 14, 2026

Add TODO for ModelDeployment integration after #75 lands

93644ea

negz added 14 commits May 18, 2026 11:57

Tiny Qwen running E2E on the new API shape

7ef479a

Signed-off-by: Nic Cope <nicc@rk0n.org>

Run shfmt on test script

f376d76

Signed-off-by: Nic Cope <nicc@rk0n.org>

negz force-pushed the demonstration branch from 4d90225 to 7801b9b Compare May 18, 2026 18:57

negz added 6 commits May 18, 2026 12:41

Remove dead code

2aaee08

This was left behind while refactoring to the new API design. Signed-off-by: Nic Cope <nicc@rk0n.org>

Note that --model parsing is a hack

fb6a3e9

Signed-off-by: Nic Cope <nicc@rk0n.org>

negz changed the title ~~WIP: Implement the updated API shape~~ Implement the updated API shape May 19, 2026

negz marked this pull request as ready for review May 19, 2026 00:26

Copilot AI review requested due to automatic review settings May 19, 2026 00:26

negz added 4 commits May 19, 2026 10:23

Active voice / grammar pass on the docs

c50d799

Signed-off-by: Nic Cope <nicc@rk0n.org>

Minor corrections to getting started guide

2839ff7

Signed-off-by: Nic Cope <nicc@rk0n.org>

Minor correctness and formatting improvements

b92c856

Signed-off-by: Nic Cope <nicc@rk0n.org>

negz commented May 19, 2026

View reviewed changes

Comment thread lib/naming.py Outdated

negz commented May 20, 2026

View reviewed changes

Comment thread functions/compose-inference-cluster/main.py Outdated

negz commented May 20, 2026

View reviewed changes

Comment thread functions/compose-inference-cluster/main.py Outdated

negz commented May 20, 2026

View reviewed changes

Comment thread functions/compose-model-deployment/main.py Outdated

negz commented May 20, 2026

View reviewed changes

Comment thread functions/compose-model-endpoint/main.py Outdated

negz added 6 commits May 19, 2026 20:36

Import urllib.parse as a module, not urlparse directly

871dcd1

Per Google Python style, import modules rather than individual functions. Signed-off-by: Nic Cope <nicc@rk0n.org>

dennis-upbound approved these changes May 20, 2026

View reviewed changes

dennis-upbound pushed a commit that referenced this pull request May 20, 2026

Add TODO for ModelDeployment integration after #75 lands

e4555c6

dennis-upbound mentioned this pull request May 20, 2026

Add ModelCache v1alpha1 API #80

Closed

negz added 4 commits May 20, 2026 12:33

Remove scale subresource claim from docs

b7daf42

The Scaling section claimed the ModelDeployment XRD declares a Kubernetes scale subresource. It does not — Crossplane XRDs do not support the scale subresource until v2.3, which has not shipped yet. Signed-off-by: Nic Cope <nicc@rk0n.org>

Fix ModelEndpoint url field description

4c8e7f0

The description called spec.url "informational" and said ModelService does not read it. The URL is used to configure routing to the endpoint. Signed-off-by: Nic Cope <nicc@rk0n.org>

negz merged commit cfc0fad into main May 20, 2026
3 checks passed

negz mentioned this pull request May 26, 2026

Add ModelEndpoint to decouple routing from deployment #60

Closed

negz deleted the demonstration branch June 16, 2026 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement the updated API shape#75

Implement the updated API shape#75
negz merged 35 commits into
mainfrom
demonstration

negz commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dennis-upbound left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

negz commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dennis-upbound left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

negz commented May 13, 2026 •

edited

Loading