This directory contains the Latency Predictor component with dual server architecture (training and prediction servers). Use the provided build-deploy.sh script to build and deploy container images to Google Cloud Platform.
- Docker (latest version)
- Google Cloud SDK (
gcloud) configured and authenticated - Required files in directory:
training_server.pyprediction_server.pyrequirements.txtDockerfile-trainingDockerfile-predictiondual-server-deployment.yaml
Optional (for deployment and testing):
- kubectl configured for GKE cluster access
Before running the script, update the configuration variables in build-deploy.sh:
# Edit these values in the script
PROJECT_ID="your-gcp-project-id"
REGION="your-gcp-region"
REPOSITORY="your-artifact-registry-repo"
TRAINING_IMAGE="latencypredictor-training-server"
PREDICTION_IMAGE="latencypredictor-prediction-server"
TAG="latest"# Make script executable
chmod +x build-deploy.sh
# Build and push images to registry
./build-deploy.sh build
./build-deploy.sh push# Run complete process (build, push, deploy, test)
# Note: This requires GKE cluster access
./build-deploy.sh all# Check if all required files exist
./build-deploy.sh check
# Build Docker images only
./build-deploy.sh build
# Push images to Google Artifact Registry
./build-deploy.sh push
# Optional: Deploy to GKE cluster (requires cluster access)
./build-deploy.sh deploy
# Optional: Get service information and IPs
./build-deploy.sh info
# Optional: Test the deployed services
./build-deploy.sh test- Builds training server image from
Dockerfile-training - Builds prediction server image from
Dockerfile-prediction - Tags images for Google Artifact Registry
- Images created:
latencypredictor-training-server:latestlatencypredictor-prediction-server:latest
- Configures Docker for Artifact Registry authentication
- Pushes both images to:
us-docker.pkg.dev/PROJECT_ID/REPOSITORY/latencypredictor-training-server:latestus-docker.pkg.dev/PROJECT_ID/REPOSITORY/latencypredictor-prediction-server:latest
- Applies Kubernetes manifests from
dual-server-deployment.yaml - Waits for deployments to be ready (5-minute timeout)
- Creates services:
training-service-external(LoadBalancer)prediction-service(LoadBalancer)
- Tests health endpoint:
/healthz - Tests prediction endpoint:
/predictwith sample data - Sample prediction request:
{ "kv_cache_percentage": 0.3, "input_token_length": 100, "num_request_waiting": 2, "num_request_running": 1, "num_tokens_generated": 50 }
-
Configure GCP Authentication:
gcloud auth login gcloud config set project YOUR_PROJECT_ID -
Configure kubectl for GKE (Optional - only needed for deployment):
gcloud container clusters get-credentials CLUSTER_NAME --zone ZONE
-
Update Script Configuration:
# Edit build-deploy.sh with your project details nano build-deploy.sh -
Build Images:
./build-deploy.sh build ./build-deploy.sh push
-
Optional: Deploy and Test:
./build-deploy.sh deploy ./build-deploy.sh test # Or run everything at once ./build-deploy.sh all
chmod +x build-deploy.shgcloud auth configure-docker us-docker.pkg.devkubectl cluster-info
kubectl get nodes./build-deploy.sh info
kubectl get services
kubectl get pods# Training server logs
kubectl logs -l app=training-server
# Prediction server logs
kubectl logs -l app=prediction-server-
Make code changes to
training_server.pyorprediction_server.py -
Test locally (optional):
python training_server.py python prediction_server.py
-
Build and push images:
./build-deploy.sh build ./build-deploy.sh push
-
Optional: Deploy and test:
./build-deploy.sh deploy ./build-deploy.sh test
After successful deployment:
- Training Service: External LoadBalancer IP (check with
./build-deploy.sh info) - Prediction Service: External LoadBalancer IP (check with
./build-deploy.sh info) - Health Check:
http://PREDICTION_IP/healthz - Prediction API:
http://PREDICTION_IP/predict(POST)
If you need to build manually:
# Build training server
docker build -f Dockerfile-training -t training-server .
# Build prediction server
docker build -f Dockerfile-prediction -t prediction-server .