Fix the GAIE inferencepool install in the serving stack#159
Merged
Conversation
The serving stack installs the Gateway API Inference Extension inferencepool Helm chart from oci://ghcr.io/kubernetes-sigs/gateway-api-inference-extension/charts. That ghcr path denies anonymous pulls: its token endpoint returns 403, so provider-helm can't fetch the chart and the Release never installs. Because the Release is always composed, this pins the ServingStack's BackendReady and the InferenceCluster's Ready to False, and the scheduler won't place a model on a cluster that isn't Ready. GAIE publishes the chart for public consumption on registry.k8s.io, which serves it anonymously. This points the repo there. The chart name and version are unchanged. Fixes #157. Signed-off-by: Nic Cope <nicc@rk0n.org>
dennis-upbound
approved these changes
Jun 16, 2026
dennis-upbound
left a comment
Collaborator
There was a problem hiding this comment.
thanks for doing this. hope this works
negz
commented
Jun 16, 2026
88c6219 to
fb7c50f
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes the ServingStack’s Gateway API Inference Extension (GAIE) installation so newly provisioned InferenceClusters can converge to Ready and schedule models by installing GAIE CRDs directly (instead of via the inferencepool Helm chart) and ensuring AI Gateway releases are included in readiness aggregation.
Changes:
- Vendor GAIE CRDs from the upstream release manifest and compose them as provider-kubernetes
Objects on the workload cluster. - Update serving-stack readiness aggregation to include
ai-gateway-crdsandai-gateway. - Adjust compose-serving-stack unit tests to validate the new GAIE CRD composition shape and readiness behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| functions/compose-serving-stack/function/fn.py | Loads vendored GAIE CRDs, composes them as provider-kubernetes Objects, and marks AI Gateway releases ready in readiness aggregation. |
| functions/compose-serving-stack/function/gaie_crds.yaml | Adds vendored GAIE CRDs (from upstream manifests.yaml) for installation onto workload clusters. |
| functions/compose-serving-stack/tests/test_fn.py | Updates tests to expect GAIE CRDs as individual composed Objects and to account for AI Gateway readiness propagation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4 tasks
compose-serving-stack installed the Gateway API Inference Extension inferencepool Helm chart to get the InferencePool CRD onto the workload cluster. That chart ships no CRDs: it renders a running InferencePool instance plus its endpoint picker, and requires inferencePool.modelServers.matchLabels, which the serving stack doesn't set. So the Release failed to render, pinning the ServingStack's BackendReady and the InferenceCluster's Ready to False, which blocks model scheduling. The CRDs are published in the upstream release's manifests.yaml, not the chart. This vendors them into the function and applies them as provider-kubernetes Objects on the remote cluster, marking each ready once its Object is Ready. It follows the pattern compose-inference-gateway already uses to install the Gateway API CRDs that the Traefik chart likewise doesn't ship. Fixes #158. Signed-off-by: Nic Cope <nicc@rk0n.org>
compose-serving-stack composed the ai-gateway-crds and ai-gateway Helm releases but never marked them ready. mark_readiness only marks the resources in its condition_ready list, and these two weren't in it, so the ServingStack's readiness aggregation waited on them forever: the XR stayed Ready=False even with every release and object healthy on the cluster, and the InferenceCluster never reached Ready. This adds both to condition_ready, so they're marked ready once their Releases report Ready, like the rest of the serving stack. Signed-off-by: Nic Cope <nicc@rk0n.org>
fb7c50f to
05e8b4c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #157.
Fixes #158.
The serving stack's Gateway API Inference Extension (GAIE) install, added in #142, can't complete, so a freshly provisioned InferenceCluster never reaches
Readyand no model can be scheduled onto it. Three problems, found while bringing up an EKS-backed cluster end to end.First, the
inferencepoolchart was pulled fromoci://ghcr.io/kubernetes-sigs/gateway-api-inference-extension/charts, which denies anonymous pulls — its token endpoint returns 403. provider-helm can't fetch the chart, so the Release never installs. GAIE publishes the chart for public consumption onregistry.k8s.io, which serves it anonymously, so this points the repo there.Second, with the registry corrected, the install still fails: the
inferencepoolchart ships no CRDs. It renders a running InferencePool instance plus its endpoint picker and requiresinferencePool.modelServers.matchLabels, which the serving stack doesn't set, so the render errors out. The CRDs the serving stack actually wants live in the upstream release'smanifests.yaml. This vendors them into the function and applies them as provider-kubernetes Objects on the remote cluster, the same waycompose-inference-gatewayinstalls the Gateway API CRDs that the Traefik chart likewise doesn't ship. The upstream manifests are a live-cluster export carrying a top-levelstatusandmetadata.creationTimestamp; those are stripped from the vendored copy (with a header note for re-vendoring) so the Object doesn't re-apply every reconcile and trip the composite's watch circuit breaker.Third, the
ai-gateway-crdsandai-gatewayreleases were composed but never marked ready — they weren't inmark_readiness's list. So even with every release and object healthy on the cluster, the ServingStack's readiness aggregation waited on them forever and the InferenceCluster never reachedReady. This adds them to the list.The first two unmask each other in sequence (the registry fix reveals the chart-content problem); the third was hidden behind both. Each maps to its own commit.
Validated end to end on an EKS-backed InferenceCluster: with all three fixes the GAIE CRDs install and settle, the serving stack converges, and the InferenceCluster reaches
Ready=True.I have:
nix flake check(or./nix.sh flake check) and made sure it passes.git commit -s.