Automated Site Reliability Engineering (SRE) Agent
Use your application code to indentify deployment-ready Service Level Objectives (SLOs) through Terraform in minutes using Google Gemini.
The AI-Powered SLO Consultant is an intelligent workflow tool designed to bridge the gap between application code and reliability engineering. Instead of manually digging through code to understand dependencies, drawing architecture diagrams, and writing complex Terraform configurations, this agent does it for you.
It acts as a "Human-in-the-Loop" wizard, guiding you through 5 stages of reliability design:
- Repo Analysis: Understands your file structure.
- Journey Discovery: Identifies Critical User Journeys (CUJs) from code.
- Visualization: Draws MermaidJS sequence diagrams for every journey.
- SLO Design: Drafts detailed SLO specs (Availability, Latency, Error Budgets).
- Infrastructure as Code: Generates the actual Terraform to deploy these monitors to Google Cloud.
- 🤖 Goal-Driven Auto-Pilot: Don't want to click buttons? Just tell the chat "Generate Terraform for this repo," and the agent will intelligently execute the entire pipeline step-by-step, pausing for your confirmation at critical checkpoints.
- 🗣️ Guided Discovery: You can provide context before the analysis starts. Tell the agent, "Focus on the login and checkout flows," and it will prioritize those journeys during the discovery phase.
- ✨ Smart Refinement: A built-in chat assistant allows you to refine artifacts using natural language. Ask it to "Add a Redis cache to the checkout diagram" or "Make the latency target stricter," and it will draft the edits for you.
- ✅ Interactive Proposals: All AI-suggested changes (whether to diagrams, SLOs, or the Journey List) are presented as Proposals. You can review the "Before vs. After" difference in a clean UI before accepting or discarding the change.
- 🔍 Code-to-Journey Analysis: Automatically parses repositories (Java, Go, Python, etc.) to find API endpoints and critical paths with option to target specific branches or sub-folders within a repository for a more focused analysis.
- 📊 Auto-Diagramming: Generates MermaidJS Sequence Diagrams by tracing function calls and service dependencies in your code.
- 📝 Automated SLO Specs: Writes professional Markdown design docs defining SLIs, SLO targets, and error budget policies.
- 🏗️ Terraform Generation: Converts abstract SLO designs into valid, ready-to-apply terraform code for Google Cloud Monitoring.
- ☁️ Cloud Sync: Automatically uploads all artifacts (Diagrams, Docs, Terraform files) to a Google Cloud Storage (GCS) bucket for persistence and auditing.
- ✨ Interactive UI: A clean Streamlit interface that allows you to review, edit, and refine the AI's output at every step.
- Python 3.10+
- Google Cloud Project with Vertex AI API enabled.
- Google Cloud Storage Bucket (to store generated artifacts).
git clone <<repo url>>
cd slo-assistantpython -m venv venv
source venv/bin/activatepip install .Create a .env file in the root directory:
touch .envAdd the following configuration variables:
# GCP Project ID
GCP_PROJECT_ID=your-gcp-project-id
# Storage Configuration
GCP_BUCKET_NAME=your-storage-bucket-name
This project uses pytest for testing, black for code formatting, and pylint for linting.
To run the test suite and generate a coverage report:
make testTo ensure code quality and consistency, run make format (which uses black with a line length of 100) and make lint (which uses pylint):
# Format code
make format
# Run linter
make lintThe application uses a decoupled architecture with a FastAPI backend and a Streamlit frontend.
To start both servers simultaneously and handle port availability automatically, use the provided bash script:
./run.shThe script will:
- Start the FastAPI backend server (default port 8080).
- Start the Streamlit UI (default port 8501).
- Automatically find the next available port if the default ports are in use.
The UI will open in your browser at the URL provided in the terminal output (e.g., http://localhost:8501).
- Configuration: Enter your Git Repository URL (e.g.,
https://github.com/GoogleCloudPlatform/microservices-demo). - Start or Resume:
- New Session: Enter your Git URL and Begin with step 1. The app generates a unique Session ID automatically.
- Resume: Paste an existing Session ID into the sidebar to restore your progress. The app automatically re-clones the repo and loads your previous state (CUJs, Diagrams, SLOs) from the cloud.
- Step 1 - Analysis: Click Clone & Analyze to load the file structure.
- Step 2 - CUJs: Click Identify CUJs. Review the identified journeys. You can Edit the names or files if the AI missed something.
- Step 3 - Diagrams: Generate architecture diagrams. View them in the tabs to ensure the flow is correct.
- Step 4 - SLO Design: The agent will draft an SLO document. Use the Edit mode to tweak targets (e.g., change 99.9% to 99.99%).
- Step 5 - Terraform: Generate the final infrastructure code. Download the files or view them in your GCS bucket.
- Start: Enter your Git URL.
- Command: Type a goal into the chat sidebar, such as:
- "Identify the Critical User Journeys"
- "Generate the full Terraform configuration for this repo."
- Interact: The agent will perform the work and present Proposals.
- It will ask: "I've identified 5 journeys. Shall I proceed to diagrams?"
- You can reply: "Yes" or "Wait, rename 'Login' to 'Auth' first."
- View vs. Edit: Toggle between a read-only preview and a raw text editor for every artifact.
- Proposal UI: A dedicated card view to Accept or Discard AI-generated changes safely.
- Cloud Console Links: Direct links to the generated artifacts in your Google Cloud Storage bucket are provided at every step for easy auditing.