This document provides instructions on how to run the end-to-end tests.
The end-to-end tests are designed to validate end-to-end Gateway API Inference Extension functionality. These tests are executed against a Kubernetes cluster and use the Ginkgo testing framework to ensure the extension behaves as expected.
-
Go installed on your machine.
-
Make installed to run the end-to-end test target.
-
(Optional) When using the GPU-based vLLM deployment, a Hugging Face Hub token with access to the meta-llama/Llama-3.1-8B-Instruct model is required. After obtaining the token and being granted access to the model, set the
HF_TOKENenvironment variable:export HF_TOKEN=<MY_HF_TOKEN>
Follow these steps to run the end-to-end tests:
-
Clone the Repository: Clone the
gateway-api-inference-extensionrepository:git clone https://github.com/kubernetes-sigs/gateway-api-inference-extension.git && cd gateway-api-inference-extension
-
Optional Settings
-
Set the test namespace: By default, the e2e test creates resources in the
inf-ext-e2enamespace. If you would like to change this namespace, set the following environment variable:export E2E_NS=<MY_NS>
-
Set the model server manifest: By default, the e2e test uses the vLLM Simulator (
config/manifests/vllm/sim-deployment.yaml) to simulate a backend model server. If you would like to change the model server deployment type, set the following environment variable to one of the following:export E2E_MANIFEST_PATH=[config/manifests/vllm/gpu-deployment.yaml|config/manifests/vllm/cpu-deployment.yaml]
-
Enable leader election tests: By default, the e2e test runs the EPP server as a single replica. To test the high-availability (HA) mode with leader election (3 replicas), set the following environment variable:
export E2E_LEADER_ELECTION_ENABLED=true -
Pause before cleanup: To pause the test run before cleaning up resources, set the
E2E_PAUSE_ON_EXITenvironment variable. This is useful for debugging the state of the cluster after the test has run.- To pause indefinitely, set it to
true:export E2E_PAUSE_ON_EXIT=true - To pause for a specific duration, provide a duration string:
export E2E_PAUSE_ON_EXIT=10m
- To pause indefinitely, set it to
-
-
Run the Tests: Run the
test-e2etarget:make test-e2e
The test suite prints details for each step. Note that the
vllm-llama3-8b-instructmodel server deployment may take several minutes to report anAvailable=Truestatus due to the time required for bootstrapping.