Latency Predictor - Build Guide

This directory contains the Latency Predictor component with dual server architecture (training and prediction servers). Use the provided build-deploy.sh script to build and deploy container images to Google Cloud Platform.

Prerequisites

Docker (latest version)
Google Cloud SDK (gcloud) configured and authenticated
Required files in directory:
- training_server.py
- prediction_server.py
- requirements.txt
- Dockerfile-training
- Dockerfile-prediction
- dual-server-deployment.yaml

Optional (for deployment and testing):

kubectl configured for GKE cluster access

Configuration

Before running the script, update the configuration variables in build-deploy.sh:

# Edit these values in the script
PROJECT_ID="your-gcp-project-id"
REGION="your-gcp-region"
REPOSITORY="your-artifact-registry-repo"
TRAINING_IMAGE="latencypredictor-training-server"
PREDICTION_IMAGE="latencypredictor-prediction-server"
TAG="latest"

Usage

Build Images Only

# Make script executable
chmod +x build-deploy.sh

# Build and push images to registry
./build-deploy.sh build
./build-deploy.sh push

Complete Build and Deploy (Optional)

# Run complete process (build, push, deploy, test)
# Note: This requires GKE cluster access
./build-deploy.sh all

Individual Commands

# Check if all required files exist
./build-deploy.sh check

# Build Docker images only
./build-deploy.sh build

# Push images to Google Artifact Registry
./build-deploy.sh push

# Optional: Deploy to GKE cluster (requires cluster access)
./build-deploy.sh deploy

# Optional: Get service information and IPs
./build-deploy.sh info

# Optional: Test the deployed services
./build-deploy.sh test

What the Script Does

Build Phase (`./build-deploy.sh build`)

Builds training server image from Dockerfile-training
Builds prediction server image from Dockerfile-prediction
Tags images for Google Artifact Registry
Images created:
- latencypredictor-training-server:latest
- latencypredictor-prediction-server:latest

Push Phase (`./build-deploy.sh push`)

Configures Docker for Artifact Registry authentication
Pushes both images to:
- us-docker.pkg.dev/PROJECT_ID/REPOSITORY/latencypredictor-training-server:latest
- us-docker.pkg.dev/PROJECT_ID/REPOSITORY/latencypredictor-prediction-server:latest

Deploy Phase (`./build-deploy.sh deploy`) - Optional

Applies Kubernetes manifests from dual-server-deployment.yaml
Waits for deployments to be ready (5-minute timeout)
Creates services:
- training-service-external (LoadBalancer)
- prediction-service (LoadBalancer)

Test Phase (`./build-deploy.sh test`) - Optional

Tests health endpoint: /healthz
Tests prediction endpoint: /predict with sample data

Sample prediction request:

{
  "kv_cache_percentage": 0.3,
  "input_token_length": 100,
  "num_request_waiting": 2,
  "num_request_running": 1,
  "num_tokens_generated": 50
}

Setup Instructions

Configure GCP Authentication:

gcloud auth login
gcloud config set project YOUR_PROJECT_ID

Configure kubectl for GKE (Optional - only needed for deployment):

gcloud container clusters get-credentials CLUSTER_NAME --zone ZONE

Update Script Configuration:

# Edit build-deploy.sh with your project details
nano build-deploy.sh

Build Images:

./build-deploy.sh build
./build-deploy.sh push

Optional: Deploy and Test:

./build-deploy.sh deploy
./build-deploy.sh test
# Or run everything at once
./build-deploy.sh all

Troubleshooting

Permission Issues

chmod +x build-deploy.sh

GCP Authentication

gcloud auth configure-docker us-docker.pkg.dev

Check Cluster Access

kubectl cluster-info
kubectl get nodes

View Service Status

./build-deploy.sh info
kubectl get services
kubectl get pods

Check Logs

# Training server logs
kubectl logs -l app=training-server

# Prediction server logs  
kubectl logs -l app=prediction-server

Development Workflow

Make code changes to training_server.py or prediction_server.py

Test locally (optional):

python training_server.py
python prediction_server.py

Build and push images:

./build-deploy.sh build
./build-deploy.sh push

Optional: Deploy and test:

./build-deploy.sh deploy
./build-deploy.sh test

Service Endpoints

After successful deployment:

Training Service: External LoadBalancer IP (check with ./build-deploy.sh info)
Prediction Service: External LoadBalancer IP (check with ./build-deploy.sh info)
Health Check: http://PREDICTION_IP/healthz
Prediction API: http://PREDICTION_IP/predict (POST)

Manual Build (Alternative)

If you need to build manually:

# Build training server
docker build -f Dockerfile-training -t training-server .

# Build prediction server  
docker build -f Dockerfile-prediction -t prediction-server .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency Predictor - Build Guide

Prerequisites

Configuration

Usage

Build Images Only

Complete Build and Deploy (Optional)

Individual Commands

What the Script Does

Build Phase (`./build-deploy.sh build`)

Push Phase (`./build-deploy.sh push`)

Deploy Phase (`./build-deploy.sh deploy`) - Optional

Test Phase (`./build-deploy.sh test`) - Optional

Setup Instructions

Troubleshooting

Permission Issues

GCP Authentication

Check Cluster Access

View Service Status

Check Logs

Development Workflow

Service Endpoints

Manual Build (Alternative)

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Latency Predictor - Build Guide

Prerequisites

Configuration

Usage

Build Images Only

Complete Build and Deploy (Optional)

Individual Commands

What the Script Does

Build Phase (./build-deploy.sh build)

Push Phase (./build-deploy.sh push)

Deploy Phase (./build-deploy.sh deploy) - Optional

Test Phase (./build-deploy.sh test) - Optional

Setup Instructions

Troubleshooting

Permission Issues

GCP Authentication

Check Cluster Access

View Service Status

Check Logs

Development Workflow

Service Endpoints

Manual Build (Alternative)

Build Phase (`./build-deploy.sh build`)

Push Phase (`./build-deploy.sh push`)

Deploy Phase (`./build-deploy.sh deploy`) - Optional

Test Phase (`./build-deploy.sh test`) - Optional