You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor: Update bootstrap_agent to use new expert architecture
This commit refactors the `bootstrap_agent` Ansible role to align with the new, unified `prima-expert.nomad` architecture.
The `deploy_llama_cpp_model.yaml` task has been updated to:
- Render the `prima-expert.nomad` template instead of the old job file.
- Pass the correct variables for the default main expert.
- Wait for the correct service name (`prima-api-main`) in the health check.
- Remove obsolete log tailing logic.
This change ensures that the `bootstrap.sh` script correctly deploys the new, distributed expert service as the default LLM backend.
- **`--ask-become-pass`**: This flag is important. It will prompt you for your `sudo` password, which Ansible needs to perform administrative tasks.
60
-
- **What this does:** This playbook not only configures the cluster services (Consul, Nomad, etc.) but also automatically bootstraps the primary control node into a fully autonomous AI agent by deploying the necessary AI services.
60
+
- **What this does:** This playbook not only aconfigures the cluster services (Consul, Nomad, etc.) but also automatically bootstraps the primary control node into a fully autonomous AI agent by deploying the necessary AI services.
61
61
62
62
## 4. AI Service Deployment
63
-
The system is designed to be self-bootstrapping. Once the main Ansible playbook has been run on the primary control node, the AI agent is deployed automatically.
64
-
65
-
### 4.1. Automated Agent Deployment
66
-
The `bootstrap_agent` role in the Ansible playbook handles the automatic deployment of the core AI services on the primary control node. This includes:
67
-
- **`llama.cpp` RPC Service:** The primary LLM backend for the agent.
68
-
- **`pipecat` Voice Agent:** The main application that orchestrates the agent's logic, memory, and tool use.
69
-
70
-
You can monitor the status of these services by running `nomad job status` on the control node.
63
+
The system is designed to be self-bootstrapping. The `bootstrap.sh` script (or the main `playbook.yaml`) handles the deployment of the core AI services on the primary control node. This includes a default instance of the `prima-expert` job and the `pipecat` voice agent.
71
64
65
+
### 4.1. Starting and Stopping Services
72
66
Use the provided script to submit the core AI jobs to Nomad:
73
67
```bash
74
68
./start_services.sh
75
69
```
76
70
77
-
### 4.2. Advanced: Deploying Additional AI Experts
78
-
For advanced use cases, such as the Mixture-of-Experts (MoE) routing described in the Agent Architecture section, you may want to deploy additional, specialized LLM backends. You can do this manually using the Nomad CLI.
79
-
80
-
- **Example: Deploying a `prima.cpp` cluster for coding tasks:**
81
-
82
-
##### First, create a new namespace for the expert
83
-
```bash
84
-
nomad namespace apply coding
85
-
```
86
-
##### Deploy the job to the new namespace, passing variables with -var
87
-
```bash
88
-
nomad job run -namespace=coding \
89
-
-var "job_name=prima-coding-expert" \
90
-
-var "service_name=llama-api-coding" \
91
-
-var "model_path=/path/to/coding.gguf" \
92
-
/home/user/primacpp.nomad
93
-
```
94
-
The `TwinService` will automatically discover these new experts via Consul and make them available for routing.
95
-
96
-
### 4.3. Resetting and Restarting
97
71
If you make a change to a job file or need to restart the services from a clean state, it's best to purge the old jobs before running the start script again.
98
-
99
-
#### Stop and purge the old jobs
100
72
```bash
101
-
nomad job stop -purge llamacpp-rpc
102
-
nomad job stop -purge pipecatapp
73
+
nomad job stop -purge prima-expert-main
74
+
nomad job stop -purge pipecat-app
103
75
```
104
76
105
-
#### Start the services with the new configuration
106
-
```bash
107
-
./start_services.sh
108
-
```
77
+
### 4.2. Advanced: Deploying Additional AI Experts
78
+
The true power of this architecture is the ability to deploy multiple, specialized AI experts that the main `pipecat` agent can route queries to. With the new unified `prima-expert.nomad` job template, deploying a new expert is handled through a dedicated Ansible playbook.
79
+
80
+
1. **Define a Model List for Your Expert:**
81
+
First, open `group_vars/models.yaml` and create a new list of models for your expert. For example, to create a `creative-writing` expert, you could add:
82
+
```yaml
83
+
creative_writing_models:
84
+
- name: "phi-3-mini-instruct"
85
+
# ... other model details
86
+
```
87
+
88
+
2. **Deploy the Expert with Ansible:**
89
+
Use the `deploy_expert.yaml` playbook to render the Nomad job with your custom parameters and launch it. You pass variables on the command line using the `-e` flag.
90
+
91
+
- **Example: Deploying a `creative-writing` expert to the `creative` namespace:**
The `TwinService` in the `pipecatapp` will automatically discover any service registered in Consul with the `prima-api-` prefix and make it available for routing.
109
97
110
98
## 5. Agent Architecture: The `TwinService`
111
99
The core of this application is the `TwinService`, a custom service that acts as the agent's "brain." It orchestrates the agent's responses, memory, and tool use.
@@ -119,103 +107,35 @@ The agent can use tools to perform actions and gather information. The `TwinServ
119
107
120
108
#### Available Tools:
121
109
- **Vision (`vision.get_observation`)**
122
-
- **Description:** Gets a real-time description of what is visible in the webcam.
123
-
- **Requires:** A USB webcam connected to the node running the `pipecat-app` job.
124
-
- **Example:** "What do you see?"
125
-
- **Master Control Program (`mcp.get_status`, `mcp.get_memory_summary`, etc.)**
126
-
- **Description:** A tool for introspection and self-control.
127
-
- **Examples:** "MCP, what is your status?", "Summarize your memory."
110
+
- **Master Control Program (`mcp.get_status`, etc.)**
128
111
- **SSH (`ssh.run_command`)**
129
-
- **Description:** Executes a command on a remote machine via SSH.
130
-
- **Security Warning:** This is a very powerful tool. The credentials are not stored in the code but must be provided by the LLM. For this to work, the user running the agent must have an SSH key configured that allows passwordless access to the target machine. Exercise caution when enabling the LLM to use this tool.
131
-
- **Example:** "Use ssh to run 'ls -l' on host 192.168.1.102 with user 'admin'."
132
112
- **Code Runner (`code_runner.run_python_code`)**
133
-
- **Description:** Executes a block of Python code in a secure, sandboxed Docker container and returns the output.
134
-
- **Note:** The Docker engine is installed automatically by the Ansible playbook.
135
-
- **Example:** "Use the code runner to calculate the 100th Fibonacci number."
136
113
- **Ansible (`ansible.run_playbook`)**
137
-
- **Description:** Runs an Ansible playbook to provision or manage nodes in the cluster. This is the primary tool for cluster self-management and expansion.
138
-
- **Security Warning:** This is an extremely powerful tool that can make significant changes to the cluster. It is marked as a sensitive tool and requires user approval in the web UI if `APPROVAL_MODE` is enabled.
139
-
- **Example:** "Use ansible to run the main playbook and limit it to the host 'AID-E-26'."
- **Examples:** "Use the web browser to go to google.com and tell me the content of the page."
114
+
- **Web Browser (`web_browser.goto`, etc.)**
143
115
144
116
### 5.3. Mixture of Experts (MoE) Routing
145
-
The agent is designed to function as a "Mixture of Experts." The primary LLM acts as a router, classifying the user's query and routing it to a specialized backend if appropriate.
117
+
The agent is designed to functionas a "Mixture of Experts." The primary `pipecat` agent acts as a router, classifying the user's query and routing it to a specialized backend expert if appropriate.
146
118
147
-
- **How it Works:** The `TwinService` prompt instructs the router LLM to first decide if a query is general, technical, or creative. If it's technical, for example, the router's job is to call the `route_to_coding_expert` tool. The `TwinService`then sends the query to a separate LLM cluster that is running a coding-specific model.
148
-
- **Configuration:** To use this feature, you must deploy multiple LLM backends into their own isolated namespaces using the `-var` flag to customize them.
149
-
- **Example:**
150
-
```bash
151
-
# Create the namespaces
152
-
nomad namespace apply general
153
-
nomad namespace apply coding
154
-
155
-
# Deploy a general-purpose model
156
-
nomad job run -namespace=general \
157
-
-var "job_name=prima-general-expert" \
158
-
-var "service_name=llama-api-general" \
159
-
-var "model_path=/path/to/general.gguf" \
160
-
/home/user/primacpp.nomad
161
-
162
-
# Deploy a coding model
163
-
nomad job run -namespace=coding \
164
-
-var "job_name=prima-coding-expert" \
165
-
-var "service_name=llama-api-coding" \
166
-
-var "model_path=/path/to/coding.gguf" \
167
-
/home/user/primacpp.nomad
168
-
```
169
-
- The `TwinService` discovers these experts across all namespaces using Consul.
119
+
- **How it Works:** The `TwinService` prompt instructs the main agent to first classify the user's query. If it determines the query is best handled by a specialist (e.g., a 'coding' expert), it uses the `route_to_expert` tool. This tool call is intercepted by the `TwinService`, which then forwards the query to the appropriate expert's API endpoint.
120
+
- **Configuration:** Deploying these specialized experts is done using the `deploy_expert.yaml` Ansible playbook. For detailed instructions, see the **[Deploying Additional AI Experts](#42-advanced-deploying-additional-ai-experts)** section above.
170
121
171
122
### 5.4. Configuring Agent Personas
172
123
The personality and instructions for the main router agent and each expert agent are defined in simple text files located in the `ansible/roles/pipecatapp/files/prompts/` directory. You can edit these files to customize the behavior of each agent. For example, you can edit `coding_expert.txt` to give it a different programming specialty.
173
124
174
125
## 6. Mission Control Web UI
175
-
This project includes a web-based dashboard for real-time display and debugging.
176
-
177
-
### 6.1. Accessing the UI
178
-
Once the `pipecat-app` job is running, you can access the UI by navigating to the IP address of any node in your cluster on port 8000. For example: `http://192.168.1.101:8000`.
179
-
180
-
### 6.2. Features
181
-
- **Live Terminal:** The main feature is a retro-style web terminal that provides a live stream of the agent's logs.
182
-
- **LLM-Driven Visualizations:** The agent can use the terminal to display information in creative ways. For example, status updates may be rendered as large, colorful banners using `figlet` and `lolcat` style effects.
183
-
- **Status API:** An API endpoint at `/api/status` provides the real-time status of the agent's pipelines.
184
-
185
-
### 6.3. Advanced Features
186
-
The Mission Control UI and the agent have several advanced features for power users.
187
-
188
-
#### Debug Mode
189
-
To get more detailed insight into the agent's operations, you can enable Debug Mode.
190
-
- **How to Enable:** In the `pipecatapp.nomad` job file, set the `DEBUG_MODE` environment variable to `"true"`.
191
-
- **Functionality:** When enabled, the agent will produce verbose logs for every tool call, including the result returned by the tool. This is useful for debugging tool behavior.
192
-
193
-
#### Interactive Action Approval
194
-
For enhanced safety, you can run the agent in a mode that requires manual approval for sensitive actions.
195
-
- **How to Enable:** In the `pipecatapp.nomad` job file, set the `APPROVAL_MODE` environment variable to `"true"`.
196
-
- **Functionality:** When enabled, any attempt to use a sensitive tool (like `ssh` or `code_runner`) will pause execution. A prompt will appear in the web UI with details of the action. You must click "Approve" for the action to proceed. If you click "Deny", the action is cancelled, and the agent will respond that it was not permitted to perform the action.
197
-
198
-
#### State Management
199
-
You can save and load the agent's complete memory state (both short-term and long-term) directly from the web UI.
200
-
- **How to Use:**
201
-
1. In the header of the Mission Control UI, enter a name foryour sessionin the "Enter save name..." input box.
202
-
2. Click **"Save State"** to create a snapshot of the agent's current memory. The state will be saved on the server in the `saved_states/` directory.
203
-
3. To restore a previous session, enter the name of the saved state and click **"Load State"**. This will replace the agent's current memory with the saved version.
126
+
This project includes a web-based dashboard for real-time display and debugging. To access it, navigate to the IP address of any node in your cluster on port 8000 (e.g., `http://192.168.1.101:8000`).
204
127
205
128
## 7. Testing and Verification
206
129
- **Check Cluster Status:** `nomad node status`
207
130
- **Check Job Status:** `nomad job status`
208
-
- **View Logs:**`nomad job logs <job_name>` (e.g., `pipecatapp`, `prima-cluster`) or use the Mission Control Web UI.
131
+
- **View Logs:** `nomad alloc logs <allocation_id>` or use the Mission Control Web UI.
209
132
- **Manual Test Scripts:** A set of scripts for manual testing of individual components is available in the `testing/` directory.
210
133
211
134
## 8. Performance Tuning & Service Selection
212
-
- **Model Selection:** The `prima.cpp` and `llamacpp-rpc` Nomad jobs are configured to use a placeholder model path. You will need to edit the job files to point to the GGUF model you want to use. Smaller models (3B, 7B) are recommended forbetter performance.
135
+
- **Model Selection:** The `prima-expert.nomad` job is configured via Ansible variables in `group_vars/models.yaml`. You can define different model lists for different experts.
213
136
- **Network:** Wired gigabit ethernet is strongly recommended over Wi-Fi for reduced latency.
214
137
- **VAD Tuning:** The `RealtimeSTT` sensitivity can be tuned in `app.py` for better performance in noisy environments.
215
138
- **STT/TTS Service Selection:** You can choose which Speech-to-Text and Text-to-Speech services to use by setting environment variables in the `pipecatapp.nomad` job file.
216
-
- `STT_SERVICE`: Set to `faster-whisper`for high-performance local transcription, or `deepgram` (default) to use the Deepgram API.
217
-
- `TTS_SERVICE`: Set to `kittentts`for a fast, local TTS, or `elevenlabs` (default) to use the ElevenLabs API.
218
-
- `EMBEDDING_MODEL_NAME`: Selects the sentence transformer model for the agent's long-term memory. Defaults to `all-MiniLM-L6-v2`. A good alternative is `google/embeddinggemma-300m`.
219
139
220
140
## 9. Benchmarking
221
141
This project includes two types of benchmarks.
@@ -224,13 +144,11 @@ This project includes two types of benchmarks.
224
144
Measures the end-to-end latency of a live conversation. Enable it by setting `BENCHMARK_MODE = "true"` in the `env` section of the `pipecatapp.nomad` job file. Results are printed to the job logs.
225
145
226
146
### 9.2. Standardized Performance Benchmark
227
-
Uses `llama-bench` to measure the raw inference speed (tokens/sec) of the deployed LLM backend.
228
-
1. Ensure an LLM backend is running.
229
-
2. Run the benchmark job, passing the path to your desired GGUF model using the `-var` flag:
230
-
```bash
231
-
nomad job run -var "model_path=/path/to/your/model.gguf" /home/user/benchmark.nomad
232
-
```
233
-
3. View results in the job logs: `nomad job logs llama-benchmark`
147
+
Uses `llama-bench` to measure the raw inference speed (tokens/sec) of the deployed LLM backend. Run the `benchmark.nomad` job to test the performance of the default model.
148
+
```bash
149
+
nomad job run /opt/nomad/jobs/benchmark.nomad
150
+
```
151
+
View results in the job logs: `nomad job logs llama-benchmark`
234
152
235
153
## 10. Advanced Development: Prompt Evolution
236
154
For advanced users, this project includes a workflow for automatically improving the agent's core prompt using evolutionary algorithms. See `prompt_engineering/PROMPT_ENGINEERING.md`for details.
0 commit comments