Skip to content

Commit fc2d25f

Browse files
refactor: Update bootstrap_agent to use new expert architecture
This commit refactors the `bootstrap_agent` Ansible role to align with the new, unified `prima-expert.nomad` architecture. The `deploy_llama_cpp_model.yaml` task has been updated to: - Render the `prima-expert.nomad` template instead of the old job file. - Pass the correct variables for the default main expert. - Wait for the correct service name (`prima-api-main`) in the health check. - Remove obsolete log tailing logic. This change ensures that the `bootstrap.sh` script correctly deploys the new, distributed expert service as the default LLM backend.
1 parent de5fd0c commit fc2d25f

File tree

6 files changed

+276
-187
lines changed

6 files changed

+276
-187
lines changed

README.md

Lines changed: 38 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -57,55 +57,43 @@ If you are setting up a multi-node cluster, you will need to work with the Ansib
5757
ansible-playbook -i inventory.yaml playbook.yaml --ask-become-pass
5858
```
5959
- **`--ask-become-pass`**: This flag is important. It will prompt you for your `sudo` password, which Ansible needs to perform administrative tasks.
60-
- **What this does:** This playbook not only configures the cluster services (Consul, Nomad, etc.) but also automatically bootstraps the primary control node into a fully autonomous AI agent by deploying the necessary AI services.
60+
- **What this does:** This playbook not only aconfigures the cluster services (Consul, Nomad, etc.) but also automatically bootstraps the primary control node into a fully autonomous AI agent by deploying the necessary AI services.
6161

6262
## 4. AI Service Deployment
63-
The system is designed to be self-bootstrapping. Once the main Ansible playbook has been run on the primary control node, the AI agent is deployed automatically.
64-
65-
### 4.1. Automated Agent Deployment
66-
The `bootstrap_agent` role in the Ansible playbook handles the automatic deployment of the core AI services on the primary control node. This includes:
67-
- **`llama.cpp` RPC Service:** The primary LLM backend for the agent.
68-
- **`pipecat` Voice Agent:** The main application that orchestrates the agent's logic, memory, and tool use.
69-
70-
You can monitor the status of these services by running `nomad job status` on the control node.
63+
The system is designed to be self-bootstrapping. The `bootstrap.sh` script (or the main `playbook.yaml`) handles the deployment of the core AI services on the primary control node. This includes a default instance of the `prima-expert` job and the `pipecat` voice agent.
7164

65+
### 4.1. Starting and Stopping Services
7266
Use the provided script to submit the core AI jobs to Nomad:
7367
```bash
7468
./start_services.sh
7569
```
7670

77-
### 4.2. Advanced: Deploying Additional AI Experts
78-
For advanced use cases, such as the Mixture-of-Experts (MoE) routing described in the Agent Architecture section, you may want to deploy additional, specialized LLM backends. You can do this manually using the Nomad CLI.
79-
80-
- **Example: Deploying a `prima.cpp` cluster for coding tasks:**
81-
82-
##### First, create a new namespace for the expert
83-
```bash
84-
nomad namespace apply coding
85-
```
86-
##### Deploy the job to the new namespace, passing variables with -var
87-
```bash
88-
nomad job run -namespace=coding \
89-
-var "job_name=prima-coding-expert" \
90-
-var "service_name=llama-api-coding" \
91-
-var "model_path=/path/to/coding.gguf" \
92-
/home/user/primacpp.nomad
93-
```
94-
The `TwinService` will automatically discover these new experts via Consul and make them available for routing.
95-
96-
### 4.3. Resetting and Restarting
9771
If you make a change to a job file or need to restart the services from a clean state, it's best to purge the old jobs before running the start script again.
98-
99-
#### Stop and purge the old jobs
10072
```bash
101-
nomad job stop -purge llamacpp-rpc
102-
nomad job stop -purge pipecatapp
73+
nomad job stop -purge prima-expert-main
74+
nomad job stop -purge pipecat-app
10375
```
10476
105-
#### Start the services with the new configuration
106-
```bash
107-
./start_services.sh
108-
```
77+
### 4.2. Advanced: Deploying Additional AI Experts
78+
The true power of this architecture is the ability to deploy multiple, specialized AI experts that the main `pipecat` agent can route queries to. With the new unified `prima-expert.nomad` job template, deploying a new expert is handled through a dedicated Ansible playbook.
79+
80+
1. **Define a Model List for Your Expert:**
81+
First, open `group_vars/models.yaml` and create a new list of models for your expert. For example, to create a `creative-writing` expert, you could add:
82+
```yaml
83+
creative_writing_models:
84+
- name: "phi-3-mini-instruct"
85+
# ... other model details
86+
```
87+
88+
2. **Deploy the Expert with Ansible:**
89+
Use the `deploy_expert.yaml` playbook to render the Nomad job with your custom parameters and launch it. You pass variables on the command line using the `-e` flag.
90+
91+
- **Example: Deploying a `creative-writing` expert to the `creative` namespace:**
92+
```bash
93+
ansible-playbook deploy_expert.yaml -e "job_name=creative-expert service_name=prima-api-creative namespace=creative model_list={{ creative_writing_models }} worker_count=2"
94+
```
95+
96+
The `TwinService` in the `pipecatapp` will automatically discover any service registered in Consul with the `prima-api-` prefix and make it available for routing.
10997
11098
## 5. Agent Architecture: The `TwinService`
11199
The core of this application is the `TwinService`, a custom service that acts as the agent's "brain." It orchestrates the agent's responses, memory, and tool use.
@@ -119,103 +107,35 @@ The agent can use tools to perform actions and gather information. The `TwinServ
119107

120108
#### Available Tools:
121109
- **Vision (`vision.get_observation`)**
122-
- **Description:** Gets a real-time description of what is visible in the webcam.
123-
- **Requires:** A USB webcam connected to the node running the `pipecat-app` job.
124-
- **Example:** "What do you see?"
125-
- **Master Control Program (`mcp.get_status`, `mcp.get_memory_summary`, etc.)**
126-
- **Description:** A tool for introspection and self-control.
127-
- **Examples:** "MCP, what is your status?", "Summarize your memory."
110+
- **Master Control Program (`mcp.get_status`, etc.)**
128111
- **SSH (`ssh.run_command`)**
129-
- **Description:** Executes a command on a remote machine via SSH.
130-
- **Security Warning:** This is a very powerful tool. The credentials are not stored in the code but must be provided by the LLM. For this to work, the user running the agent must have an SSH key configured that allows passwordless access to the target machine. Exercise caution when enabling the LLM to use this tool.
131-
- **Example:** "Use ssh to run 'ls -l' on host 192.168.1.102 with user 'admin'."
132112
- **Code Runner (`code_runner.run_python_code`)**
133-
- **Description:** Executes a block of Python code in a secure, sandboxed Docker container and returns the output.
134-
- **Note:** The Docker engine is installed automatically by the Ansible playbook.
135-
- **Example:** "Use the code runner to calculate the 100th Fibonacci number."
136113
- **Ansible (`ansible.run_playbook`)**
137-
- **Description:** Runs an Ansible playbook to provision or manage nodes in the cluster. This is the primary tool for cluster self-management and expansion.
138-
- **Security Warning:** This is an extremely powerful tool that can make significant changes to the cluster. It is marked as a sensitive tool and requires user approval in the web UI if `APPROVAL_MODE` is enabled.
139-
- **Example:** "Use ansible to run the main playbook and limit it to the host 'AID-E-26'."
140-
- **Web Browser (`web_browser.goto`, `web_browser.get_page_content`, etc.)**
141-
- **Description:** A tool for browsing the web.
142-
- **Examples:** "Use the web browser to go to google.com and tell me the content of the page."
114+
- **Web Browser (`web_browser.goto`, etc.)**
143115

144116
### 5.3. Mixture of Experts (MoE) Routing
145-
The agent is designed to function as a "Mixture of Experts." The primary LLM acts as a router, classifying the user's query and routing it to a specialized backend if appropriate.
117+
The agent is designed to function as a "Mixture of Experts." The primary `pipecat` agent acts as a router, classifying the user's query and routing it to a specialized backend expert if appropriate.
146118
147-
- **How it Works:** The `TwinService` prompt instructs the router LLM to first decide if a query is general, technical, or creative. If it's technical, for example, the router's job is to call the `route_to_coding_expert` tool. The `TwinService` then sends the query to a separate LLM cluster that is running a coding-specific model.
148-
- **Configuration:** To use this feature, you must deploy multiple LLM backends into their own isolated namespaces using the `-var` flag to customize them.
149-
- **Example:**
150-
```bash
151-
# Create the namespaces
152-
nomad namespace apply general
153-
nomad namespace apply coding
154-
155-
# Deploy a general-purpose model
156-
nomad job run -namespace=general \
157-
-var "job_name=prima-general-expert" \
158-
-var "service_name=llama-api-general" \
159-
-var "model_path=/path/to/general.gguf" \
160-
/home/user/primacpp.nomad
161-
162-
# Deploy a coding model
163-
nomad job run -namespace=coding \
164-
-var "job_name=prima-coding-expert" \
165-
-var "service_name=llama-api-coding" \
166-
-var "model_path=/path/to/coding.gguf" \
167-
/home/user/primacpp.nomad
168-
```
169-
- The `TwinService` discovers these experts across all namespaces using Consul.
119+
- **How it Works:** The `TwinService` prompt instructs the main agent to first classify the user's query. If it determines the query is best handled by a specialist (e.g., a 'coding' expert), it uses the `route_to_expert` tool. This tool call is intercepted by the `TwinService`, which then forwards the query to the appropriate expert's API endpoint.
120+
- **Configuration:** Deploying these specialized experts is done using the `deploy_expert.yaml` Ansible playbook. For detailed instructions, see the **[Deploying Additional AI Experts](#42-advanced-deploying-additional-ai-experts)** section above.
170121
171122
### 5.4. Configuring Agent Personas
172123
The personality and instructions for the main router agent and each expert agent are defined in simple text files located in the `ansible/roles/pipecatapp/files/prompts/` directory. You can edit these files to customize the behavior of each agent. For example, you can edit `coding_expert.txt` to give it a different programming specialty.
173124
174125
## 6. Mission Control Web UI
175-
This project includes a web-based dashboard for real-time display and debugging.
176-
177-
### 6.1. Accessing the UI
178-
Once the `pipecat-app` job is running, you can access the UI by navigating to the IP address of any node in your cluster on port 8000. For example: `http://192.168.1.101:8000`.
179-
180-
### 6.2. Features
181-
- **Live Terminal:** The main feature is a retro-style web terminal that provides a live stream of the agent's logs.
182-
- **LLM-Driven Visualizations:** The agent can use the terminal to display information in creative ways. For example, status updates may be rendered as large, colorful banners using `figlet` and `lolcat` style effects.
183-
- **Status API:** An API endpoint at `/api/status` provides the real-time status of the agent's pipelines.
184-
185-
### 6.3. Advanced Features
186-
The Mission Control UI and the agent have several advanced features for power users.
187-
188-
#### Debug Mode
189-
To get more detailed insight into the agent's operations, you can enable Debug Mode.
190-
- **How to Enable:** In the `pipecatapp.nomad` job file, set the `DEBUG_MODE` environment variable to `"true"`.
191-
- **Functionality:** When enabled, the agent will produce verbose logs for every tool call, including the result returned by the tool. This is useful for debugging tool behavior.
192-
193-
#### Interactive Action Approval
194-
For enhanced safety, you can run the agent in a mode that requires manual approval for sensitive actions.
195-
- **How to Enable:** In the `pipecatapp.nomad` job file, set the `APPROVAL_MODE` environment variable to `"true"`.
196-
- **Functionality:** When enabled, any attempt to use a sensitive tool (like `ssh` or `code_runner`) will pause execution. A prompt will appear in the web UI with details of the action. You must click "Approve" for the action to proceed. If you click "Deny", the action is cancelled, and the agent will respond that it was not permitted to perform the action.
197-
198-
#### State Management
199-
You can save and load the agent's complete memory state (both short-term and long-term) directly from the web UI.
200-
- **How to Use:**
201-
1. In the header of the Mission Control UI, enter a name for your session in the "Enter save name..." input box.
202-
2. Click **"Save State"** to create a snapshot of the agent's current memory. The state will be saved on the server in the `saved_states/` directory.
203-
3. To restore a previous session, enter the name of the saved state and click **"Load State"**. This will replace the agent's current memory with the saved version.
126+
This project includes a web-based dashboard for real-time display and debugging. To access it, navigate to the IP address of any node in your cluster on port 8000 (e.g., `http://192.168.1.101:8000`).
204127
205128
## 7. Testing and Verification
206129
- **Check Cluster Status:** `nomad node status`
207130
- **Check Job Status:** `nomad job status`
208-
- **View Logs:** `nomad job logs <job_name>` (e.g., `pipecatapp`, `prima-cluster`) or use the Mission Control Web UI.
131+
- **View Logs:** `nomad alloc logs <allocation_id>` or use the Mission Control Web UI.
209132
- **Manual Test Scripts:** A set of scripts for manual testing of individual components is available in the `testing/` directory.
210133
211134
## 8. Performance Tuning & Service Selection
212-
- **Model Selection:** The `prima.cpp` and `llamacpp-rpc` Nomad jobs are configured to use a placeholder model path. You will need to edit the job files to point to the GGUF model you want to use. Smaller models (3B, 7B) are recommended for better performance.
135+
- **Model Selection:** The `prima-expert.nomad` job is configured via Ansible variables in `group_vars/models.yaml`. You can define different model lists for different experts.
213136
- **Network:** Wired gigabit ethernet is strongly recommended over Wi-Fi for reduced latency.
214137
- **VAD Tuning:** The `RealtimeSTT` sensitivity can be tuned in `app.py` for better performance in noisy environments.
215138
- **STT/TTS Service Selection:** You can choose which Speech-to-Text and Text-to-Speech services to use by setting environment variables in the `pipecatapp.nomad` job file.
216-
- `STT_SERVICE`: Set to `faster-whisper` for high-performance local transcription, or `deepgram` (default) to use the Deepgram API.
217-
- `TTS_SERVICE`: Set to `kittentts` for a fast, local TTS, or `elevenlabs` (default) to use the ElevenLabs API.
218-
- `EMBEDDING_MODEL_NAME`: Selects the sentence transformer model for the agent's long-term memory. Defaults to `all-MiniLM-L6-v2`. A good alternative is `google/embeddinggemma-300m`.
219139
220140
## 9. Benchmarking
221141
This project includes two types of benchmarks.
@@ -224,13 +144,11 @@ This project includes two types of benchmarks.
224144
Measures the end-to-end latency of a live conversation. Enable it by setting `BENCHMARK_MODE = "true"` in the `env` section of the `pipecatapp.nomad` job file. Results are printed to the job logs.
225145
226146
### 9.2. Standardized Performance Benchmark
227-
Uses `llama-bench` to measure the raw inference speed (tokens/sec) of the deployed LLM backend.
228-
1. Ensure an LLM backend is running.
229-
2. Run the benchmark job, passing the path to your desired GGUF model using the `-var` flag:
230-
```bash
231-
nomad job run -var "model_path=/path/to/your/model.gguf" /home/user/benchmark.nomad
232-
```
233-
3. View results in the job logs: `nomad job logs llama-benchmark`
147+
Uses `llama-bench` to measure the raw inference speed (tokens/sec) of the deployed LLM backend. Run the `benchmark.nomad` job to test the performance of the default model.
148+
```bash
149+
nomad job run /opt/nomad/jobs/benchmark.nomad
150+
```
151+
View results in the job logs: `nomad job logs llama-benchmark`
234152
235153
## 10. Advanced Development: Prompt Evolution
236154
For advanced users, this project includes a workflow for automatically improving the agent's core prompt using evolutionary algorithms. See `prompt_engineering/PROMPT_ENGINEERING.md` for details.

ansible/jobs/prima-expert.nomad

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# This is a Jinja2 template for a complete, distributed Prima expert.
2+
# It is parameterized and can be used to deploy any expert model.
3+
job "{{ job_name | default('prima-expert') }}" {
4+
datacenters = ["dc1"]
5+
namespace = "{{ namespace | default('default') }}"
6+
7+
group "master" {
8+
count = 1
9+
10+
volume "models" {
11+
type = "host"
12+
source = "models"
13+
read_only = true
14+
}
15+
16+
network {
17+
mode = "bridge"
18+
port "http" {}
19+
}
20+
21+
service {
22+
name = "{{ service_name | default('prima-api') }}"
23+
provider = "consul"
24+
port = "http"
25+
26+
check {
27+
type = "http"
28+
path = "/health"
29+
interval = "15s"
30+
timeout = "5s"
31+
}
32+
}
33+
34+
task "llama-server-master" {
35+
driver = "exec"
36+
37+
template {
38+
data = <<EOH
39+
#!/bin/bash
40+
set -e
41+
echo "Starting master server for expert: {{ job_name | default('prima-expert') }}"
42+
43+
# Discover worker services via Consul
44+
echo "Discovering worker services from Consul..."
45+
WORKER_IPS=$(curl -s "http://127.0.0.1:8500/v1/health/service/{{ job_name }}-worker?passing" | jq -r '.[].Service.Address' | tr '\n' ',' | sed 's/,$//')
46+
echo "Discovered Worker IPs: $WORKER_IPS"
47+
48+
RPC_ARGS=""
49+
if [ -n "$WORKER_IPS" ]; then
50+
echo "Workers found. Configuring RPC."
51+
RPC_ARGS="--rpc-servers $WORKER_IPS"
52+
else
53+
echo "No workers found. Starting in standalone mode."
54+
fi
55+
56+
HEALTH_CHECK_URL="http://127.0.0.1:{{ '{{' }} env "NOMAD_PORT_http" {{ '}}' }}/health"
57+
58+
# Loop through the provided models for failover
59+
{% for model in model_list %}
60+
echo "Attempting to start llama-server with model: {{ model.name }}"
61+
62+
/usr/local/bin/llama-server \
63+
--model "/opt/nomad/models/llm/{{ model.filename }}" \
64+
--host 0.0.0.0 \
65+
--port {{ '{{' }} env "NOMAD_PORT_http" {{ '}}' }} \
66+
$RPC_ARGS &
67+
68+
SERVER_PID=$!
69+
echo "Server process started with PID $SERVER_PID. Waiting for it to become healthy..."
70+
71+
HEALTHY=false
72+
for i in {1..12}; do
73+
sleep 10
74+
if curl -s --fail $HEALTH_CHECK_URL > /dev/null; then
75+
echo "Server is healthy with model {{ model.name }}!"
76+
HEALTHY=true
77+
break
78+
else
79+
echo "Health check failed (attempt $i/12)..."
80+
fi
81+
done
82+
83+
if [ "$HEALTHY" = true ]; then
84+
echo "Successfully started llama-server with model: {{ model.name }}"
85+
# Write the active model to Consul KV for other services to discover
86+
curl -X PUT --data "{{ model.name }}" http://127.0.0.1:8500/v1/kv/active_model/{{ job_name }}
87+
wait $SERVER_PID
88+
exit 0
89+
else
90+
echo "Server failed to become healthy with model: {{ model.name }}. Killing process PID $SERVER_PID..."
91+
kill $SERVER_PID
92+
wait $SERVER_PID 2>/dev/null
93+
fi
94+
{% endfor %}
95+
96+
echo "All models failed to start. Exiting."
97+
exit 1
98+
EOH
99+
destination = "local/run_master.sh"
100+
perms = "0755"
101+
}
102+
103+
config {
104+
command = "local/run_master.sh"
105+
}
106+
107+
resources {
108+
cpu = 1000
109+
memory = 8192 # Hardcoded for simplicity and stability
110+
}
111+
112+
volume_mount {
113+
volume = "models"
114+
destination = "/opt/nomad/models"
115+
read_only = true
116+
}
117+
}
118+
}
119+
120+
group "workers" {
121+
count = {{ worker_count | default(1) }}
122+
123+
network {
124+
mode = "bridge"
125+
port "rpc" {}
126+
}
127+
128+
service {
129+
name = "{{ job_name }}-worker"
130+
provider = "consul"
131+
port = "rpc"
132+
133+
check {
134+
type = "tcp"
135+
interval = "15s"
136+
timeout = "5s"
137+
}
138+
}
139+
140+
task "rpc-server-worker" {
141+
driver = "exec"
142+
143+
template {
144+
data = <<EOH
145+
#!/bin/bash
146+
set -e
147+
/usr/local/bin/rpc-server --host 0.0.0.0 --port {{ '{{' }} env "NOMAD_PORT_rpc" {{ '}}' }}
148+
EOH
149+
destination = "local/run_rpc.sh"
150+
perms = "0755"
151+
}
152+
153+
config {
154+
command = "local/run_rpc.sh"
155+
}
156+
157+
resources {
158+
cpu = 500
159+
memory = 1024
160+
}
161+
}
162+
}
163+
}

0 commit comments

Comments
 (0)