LokiMetaSmith
diff --git a/‎README.md‎
Lines changed: 38 additions & 120 deletions b/‎README.md‎
Lines changed: 38 additions & 120 deletions
diff --git a/‎ansible/jobs/prima-expert.nomad‎
Lines changed: 163 additions & 0 deletions b/‎ansible/jobs/prima-expert.nomad‎
Lines changed: 163 additions & 0 deletions
@@ -57,55 +57,43 @@ If you are setting up a multi-node cluster, you will need to work with the Ansib
     ansible-playbook -i inventory.yaml playbook.yaml --ask-become-pass
     ```
     - **`--ask-become-pass`**: This flag is important. It will prompt you for your `sudo` password, which Ansible needs to perform administrative tasks.
-    - **What this does:** This playbook not only configures the cluster services (Consul, Nomad, etc.) but also automatically bootstraps the primary control node into a fully autonomous AI agent by deploying the necessary AI services.
+    - **What this does:** This playbook not only aconfigures the cluster services (Consul, Nomad, etc.) but also automatically bootstraps the primary control node into a fully autonomous AI agent by deploying the necessary AI services.
 
 ## 4. AI Service Deployment
-The system is designed to be self-bootstrapping. Once the main Ansible playbook has been run on the primary control node, the AI agent is deployed automatically.
-
-### 4.1. Automated Agent Deployment
-The `bootstrap_agent` role in the Ansible playbook handles the automatic deployment of the core AI services on the primary control node. This includes:
-- **`llama.cpp` RPC Service:** The primary LLM backend for the agent.
-- **`pipecat` Voice Agent:** The main application that orchestrates the agent's logic, memory, and tool use.
-
-You can monitor the status of these services by running `nomad job status` on the control node.
+The system is designed to be self-bootstrapping. The `bootstrap.sh` script (or the main `playbook.yaml`) handles the deployment of the core AI services on the primary control node. This includes a default instance of the `prima-expert` job and the `pipecat` voice agent.
 
+### 4.1. Starting and Stopping Services
 Use the provided script to submit the core AI jobs to Nomad:
 ```bash
 ./start_services.sh
 ```
 
-### 4.2. Advanced: Deploying Additional AI Experts
-For advanced use cases, such as the Mixture-of-Experts (MoE) routing described in the Agent Architecture section, you may want to deploy additional, specialized LLM backends. You can do this manually using the Nomad CLI.
-
-- **Example: Deploying a `prima.cpp` cluster for coding tasks:**
-
-  ##### First, create a new namespace for the expert
-  ```bash
-  nomad namespace apply coding
-  ```
-  ##### Deploy the job to the new namespace, passing variables with -var
-  ```bash
-  nomad job run -namespace=coding \
-    -var "job_name=prima-coding-expert" \
-    -var "service_name=llama-api-coding" \
-    -var "model_path=/path/to/coding.gguf" \
-    /home/user/primacpp.nomad
-  ```
-The `TwinService` will automatically discover these new experts via Consul and make them available for routing.
-
-### 4.3. Resetting and Restarting
 If you make a change to a job file or need to restart the services from a clean state, it's best to purge the old jobs before running the start script again.
-
-#### Stop and purge the old jobs
 ```bash
-nomad job stop -purge llamacpp-rpc
-nomad job stop -purge pipecatapp
+nomad job stop -purge prima-expert-main
+nomad job stop -purge pipecat-app
 ```
 
-#### Start the services with the new configuration
-```bash
-./start_services.sh
-```
+### 4.2. Advanced: Deploying Additional AI Experts
+The true power of this architecture is the ability to deploy multiple, specialized AI experts that the main `pipecat` agent can route queries to. With the new unified `prima-expert.nomad` job template, deploying a new expert is handled through a dedicated Ansible playbook.
+
+1.  **Define a Model List for Your Expert:**
+    First, open `group_vars/models.yaml` and create a new list of models for your expert. For example, to create a `creative-writing` expert, you could add:
+    ```yaml
+    creative_writing_models:
+      - name: "phi-3-mini-instruct"
+        # ... other model details
+    ```
+
+2.  **Deploy the Expert with Ansible:**
+    Use the `deploy_expert.yaml` playbook to render the Nomad job with your custom parameters and launch it. You pass variables on the command line using the `-e` flag.
+
+    - **Example: Deploying a `creative-writing` expert to the `creative` namespace:**
+      ```bash
+      ansible-playbook deploy_expert.yaml -e "job_name=creative-expert service_name=prima-api-creative namespace=creative model_list={{ creative_writing_models }} worker_count=2"
+      ```
+
+The `TwinService` in the `pipecatapp` will automatically discover any service registered in Consul with the `prima-api-` prefix and make it available for routing.
 
 ## 5. Agent Architecture: The `TwinService`
 The core of this application is the `TwinService`, a custom service that acts as the agent's "brain." It orchestrates the agent's responses, memory, and tool use.
@@ -119,103 +107,35 @@ The agent can use tools to perform actions and gather information. The `TwinServ
 
 #### Available Tools:
 - **Vision (`vision.get_observation`)**
-  - **Description:** Gets a real-time description of what is visible in the webcam.
-  - **Requires:** A USB webcam connected to the node running the `pipecat-app` job.
-  - **Example:** "What do you see?"
-- **Master Control Program (`mcp.get_status`, `mcp.get_memory_summary`, etc.)**
-  - **Description:** A tool for introspection and self-control.
-  - **Examples:** "MCP, what is your status?", "Summarize your memory."
+- **Master Control Program (`mcp.get_status`, etc.)**
 - **SSH (`ssh.run_command`)**
-  - **Description:** Executes a command on a remote machine via SSH.
-  - **Security Warning:** This is a very powerful tool. The credentials are not stored in the code but must be provided by the LLM. For this to work, the user running the agent must have an SSH key configured that allows passwordless access to the target machine. Exercise caution when enabling the LLM to use this tool.
-  - **Example:** "Use ssh to run 'ls -l' on host 192.168.1.102 with user 'admin'."
 - **Code Runner (`code_runner.run_python_code`)**
-  - **Description:** Executes a block of Python code in a secure, sandboxed Docker container and returns the output.
-  - **Note:** The Docker engine is installed automatically by the Ansible playbook.
-  - **Example:** "Use the code runner to calculate the 100th Fibonacci number."
 - **Ansible (`ansible.run_playbook`)**
-  - **Description:** Runs an Ansible playbook to provision or manage nodes in the cluster. This is the primary tool for cluster self-management and expansion.
-  - **Security Warning:** This is an extremely powerful tool that can make significant changes to the cluster. It is marked as a sensitive tool and requires user approval in the web UI if `APPROVAL_MODE` is enabled.
-  - **Example:** "Use ansible to run the main playbook and limit it to the host 'AID-E-26'."
-- **Web Browser (`web_browser.goto`, `web_browser.get_page_content`, etc.)**
-  - **Description:** A tool for browsing the web.
-  - **Examples:** "Use the web browser to go to google.com and tell me the content of the page."
+- **Web Browser (`web_browser.goto`, etc.)**
 
 ### 5.3. Mixture of Experts (MoE) Routing
-The agent is designed to function as a "Mixture of Experts." The primary LLM acts as a router, classifying the user's query and routing it to a specialized backend if appropriate.
+The agent is designed to function as a "Mixture of Experts." The primary `pipecat` agent acts as a router, classifying the user's query and routing it to a specialized backend expert if appropriate.
 
-- **How it Works:** The `TwinService` prompt instructs the router LLM to first decide if a query is general, technical, or creative. If it's technical, for example, the router's job is to call the `route_to_coding_expert` tool. The `TwinService` then sends the query to a separate LLM cluster that is running a coding-specific model.
-- **Configuration:** To use this feature, you must deploy multiple LLM backends into their own isolated namespaces using the `-var` flag to customize them.
-  - **Example:**
-    ```bash
-    # Create the namespaces
-    nomad namespace apply general
-    nomad namespace apply coding
-
-    # Deploy a general-purpose model
-    nomad job run -namespace=general \
-      -var "job_name=prima-general-expert" \
-      -var "service_name=llama-api-general" \
-      -var "model_path=/path/to/general.gguf" \
-      /home/user/primacpp.nomad
-
-    # Deploy a coding model
-    nomad job run -namespace=coding \
-      -var "job_name=prima-coding-expert" \
-      -var "service_name=llama-api-coding" \
-      -var "model_path=/path/to/coding.gguf" \
-      /home/user/primacpp.nomad
-    ```
-  - The `TwinService` discovers these experts across all namespaces using Consul.
+- **How it Works:** The `TwinService` prompt instructs the main agent to first classify the user's query. If it determines the query is best handled by a specialist (e.g., a 'coding' expert), it uses the `route_to_expert` tool. This tool call is intercepted by the `TwinService`, which then forwards the query to the appropriate expert's API endpoint.
+- **Configuration:** Deploying these specialized experts is done using the `deploy_expert.yaml` Ansible playbook. For detailed instructions, see the **[Deploying Additional AI Experts](#42-advanced-deploying-additional-ai-experts)** section above.
 
 ### 5.4. Configuring Agent Personas
 The personality and instructions for the main router agent and each expert agent are defined in simple text files located in the `ansible/roles/pipecatapp/files/prompts/` directory. You can edit these files to customize the behavior of each agent. For example, you can edit `coding_expert.txt` to give it a different programming specialty.
 
 ## 6. Mission Control Web UI
-This project includes a web-based dashboard for real-time display and debugging.
-
-### 6.1. Accessing the UI
-Once the `pipecat-app` job is running, you can access the UI by navigating to the IP address of any node in your cluster on port 8000. For example: `http://192.168.1.101:8000`.
-
-### 6.2. Features
-- **Live Terminal:** The main feature is a retro-style web terminal that provides a live stream of the agent's logs.
-- **LLM-Driven Visualizations:** The agent can use the terminal to display information in creative ways. For example, status updates may be rendered as large, colorful banners using `figlet` and `lolcat` style effects.
-- **Status API:** An API endpoint at `/api/status` provides the real-time status of the agent's pipelines.
-
-### 6.3. Advanced Features
-The Mission Control UI and the agent have several advanced features for power users.
-
-#### Debug Mode
-To get more detailed insight into the agent's operations, you can enable Debug Mode.
-- **How to Enable:** In the `pipecatapp.nomad` job file, set the `DEBUG_MODE` environment variable to `"true"`.
-- **Functionality:** When enabled, the agent will produce verbose logs for every tool call, including the result returned by the tool. This is useful for debugging tool behavior.
-
-#### Interactive Action Approval
-For enhanced safety, you can run the agent in a mode that requires manual approval for sensitive actions.
-- **How to Enable:** In the `pipecatapp.nomad` job file, set the `APPROVAL_MODE` environment variable to `"true"`.
-- **Functionality:** When enabled, any attempt to use a sensitive tool (like `ssh` or `code_runner`) will pause execution. A prompt will appear in the web UI with details of the action. You must click "Approve" for the action to proceed. If you click "Deny", the action is cancelled, and the agent will respond that it was not permitted to perform the action.
-
-#### State Management
-You can save and load the agent's complete memory state (both short-term and long-term) directly from the web UI.
-- **How to Use:**
-  1.  In the header of the Mission Control UI, enter a name for your session in the "Enter save name..." input box.
-  2.  Click **"Save State"** to create a snapshot of the agent's current memory. The state will be saved on the server in the `saved_states/` directory.
-  3.  To restore a previous session, enter the name of the saved state and click **"Load State"**. This will replace the agent's current memory with the saved version.
+This project includes a web-based dashboard for real-time display and debugging. To access it, navigate to the IP address of any node in your cluster on port 8000 (e.g., `http://192.168.1.101:8000`).
 
 ## 7. Testing and Verification
 - **Check Cluster Status:** `nomad node status`
 - **Check Job Status:** `nomad job status`
-- **View Logs:** `nomad job logs <job_name>` (e.g., `pipecatapp`, `prima-cluster`) or use the Mission Control Web UI.
+- **View Logs:** `nomad alloc logs <allocation_id>` or use the Mission Control Web UI.
 - **Manual Test Scripts:** A set of scripts for manual testing of individual components is available in the `testing/` directory.
 
 ## 8. Performance Tuning & Service Selection
-- **Model Selection:** The `prima.cpp` and `llamacpp-rpc` Nomad jobs are configured to use a placeholder model path. You will need to edit the job files to point to the GGUF model you want to use. Smaller models (3B, 7B) are recommended for better performance.
+- **Model Selection:** The `prima-expert.nomad` job is configured via Ansible variables in `group_vars/models.yaml`. You can define different model lists for different experts.
 - **Network:** Wired gigabit ethernet is strongly recommended over Wi-Fi for reduced latency.
 - **VAD Tuning:** The `RealtimeSTT` sensitivity can be tuned in `app.py` for better performance in noisy environments.
 - **STT/TTS Service Selection:** You can choose which Speech-to-Text and Text-to-Speech services to use by setting environment variables in the `pipecatapp.nomad` job file.
-  - `STT_SERVICE`: Set to `faster-whisper` for high-performance local transcription, or `deepgram` (default) to use the Deepgram API.
-  - `TTS_SERVICE`: Set to `kittentts` for a fast, local TTS, or `elevenlabs` (default) to use the ElevenLabs API.
-  - `EMBEDDING_MODEL_NAME`: Selects the sentence transformer model for the agent's long-term memory. Defaults to `all-MiniLM-L6-v2`. A good alternative is `google/embeddinggemma-300m`.
 
 ## 9. Benchmarking
 This project includes two types of benchmarks.
@@ -224,13 +144,11 @@ This project includes two types of benchmarks.
 Measures the end-to-end latency of a live conversation. Enable it by setting `BENCHMARK_MODE = "true"` in the `env` section of the `pipecatapp.nomad` job file. Results are printed to the job logs.
 
 ### 9.2. Standardized Performance Benchmark
-Uses `llama-bench` to measure the raw inference speed (tokens/sec) of the deployed LLM backend.
-1. Ensure an LLM backend is running.
-2. Run the benchmark job, passing the path to your desired GGUF model using the `-var` flag:
-   ```bash
-   nomad job run -var "model_path=/path/to/your/model.gguf" /home/user/benchmark.nomad
-   ```
-3. View results in the job logs: `nomad job logs llama-benchmark`
+Uses `llama-bench` to measure the raw inference speed (tokens/sec) of the deployed LLM backend. Run the `benchmark.nomad` job to test the performance of the default model.
+```bash
+nomad job run /opt/nomad/jobs/benchmark.nomad
+```
+View results in the job logs: `nomad job logs llama-benchmark`
 
 ## 10. Advanced Development: Prompt Evolution
 For advanced users, this project includes a workflow for automatically improving the agent's core prompt using evolutionary algorithms. See `prompt_engineering/PROMPT_ENGINEERING.md` for details.
@@ -0,0 +1,163 @@
+# This is a Jinja2 template for a complete, distributed Prima expert.
+# It is parameterized and can be used to deploy any expert model.
+job "{{ job_name | default('prima-expert') }}" {
+  datacenters = ["dc1"]
+  namespace   = "{{ namespace | default('default') }}"
+
+  group "master" {
+    count = 1
+
+    volume "models" {
+      type      = "host"
+      source    = "models"
+      read_only = true
+    }
+
+    network {
+      mode = "bridge"
+      port "http" {}
+    }
+
+    service {
+      name     = "{{ service_name | default('prima-api') }}"
+      provider = "consul"
+      port     = "http"
+
+      check {
+        type     = "http"
+        path     = "/health"
+        interval = "15s"
+        timeout  = "5s"
+      }
+    }
+
+    task "llama-server-master" {
+      driver = "exec"
+
+      template {
+        data = <<EOH
+#!/bin/bash
+set -e
+echo "Starting master server for expert: {{ job_name | default('prima-expert') }}"
+
+# Discover worker services via Consul
+echo "Discovering worker services from Consul..."
+WORKER_IPS=$(curl -s "http://127.0.0.1:8500/v1/health/service/{{ job_name }}-worker?passing" | jq -r '.[].Service.Address' | tr '\n' ',' | sed 's/,$//')
+echo "Discovered Worker IPs: $WORKER_IPS"
+
+RPC_ARGS=""
+if [ -n "$WORKER_IPS" ]; then
+  echo "Workers found. Configuring RPC."
+  RPC_ARGS="--rpc-servers $WORKER_IPS"
+else
+  echo "No workers found. Starting in standalone mode."
+fi
+
+HEALTH_CHECK_URL="http://127.0.0.1:{{ '{{' }} env "NOMAD_PORT_http" {{ '}}' }}/health"
+
+# Loop through the provided models for failover
+{% for model in model_list %}
+  echo "Attempting to start llama-server with model: {{ model.name }}"
+
+  /usr/local/bin/llama-server \
+    --model "/opt/nomad/models/llm/{{ model.filename }}" \
+    --host 0.0.0.0 \
+    --port {{ '{{' }} env "NOMAD_PORT_http" {{ '}}' }} \
+    $RPC_ARGS &
+
+  SERVER_PID=$!
+  echo "Server process started with PID $SERVER_PID. Waiting for it to become healthy..."
+
+  HEALTHY=false
+  for i in {1..12}; do
+    sleep 10
+    if curl -s --fail $HEALTH_CHECK_URL > /dev/null; then
+      echo "Server is healthy with model {{ model.name }}!"
+      HEALTHY=true
+      break
+    else
+      echo "Health check failed (attempt $i/12)..."
+    fi
+  done
+
+  if [ "$HEALTHY" = true ]; then
+    echo "Successfully started llama-server with model: {{ model.name }}"
+    # Write the active model to Consul KV for other services to discover
+    curl -X PUT --data "{{ model.name }}" http://127.0.0.1:8500/v1/kv/active_model/{{ job_name }}
+    wait $SERVER_PID
+    exit 0
+  else
+    echo "Server failed to become healthy with model: {{ model.name }}. Killing process PID $SERVER_PID..."
+    kill $SERVER_PID
+    wait $SERVER_PID 2>/dev/null
+  fi
+{% endfor %}
+
+echo "All models failed to start. Exiting."
+exit 1
+EOH
+        destination = "local/run_master.sh"
+        perms       = "0755"
+      }
+
+      config {
+        command = "local/run_master.sh"
+      }
+
+      resources {
+        cpu    = 1000
+        memory = 8192 # Hardcoded for simplicity and stability
+      }
+
+      volume_mount {
+        volume      = "models"
+        destination = "/opt/nomad/models"
+        read_only   = true
+      }
+    }
+  }
+
+  group "workers" {
+    count = {{ worker_count | default(1) }}
+
+    network {
+      mode = "bridge"
+      port "rpc" {}
+    }
+
+    service {
+      name     = "{{ job_name }}-worker"
+      provider = "consul"
+      port     = "rpc"
+
+      check {
+        type     = "tcp"
+        interval = "15s"
+        timeout  = "5s"
+      }
+    }
+
+    task "rpc-server-worker" {
+      driver = "exec"
+
+      template {
+        data = <<EOH
+#!/bin/bash
+set -e
+/usr/local/bin/rpc-server --host 0.0.0.0 --port {{ '{{' }} env "NOMAD_PORT_rpc" {{ '}}' }}
+EOH
+        destination = "local/run_rpc.sh"
+        perms       = "0755"
+      }
+
+      config {
+        command = "local/run_rpc.sh"
+      }
+
+      resources {
+        cpu    = 500
+        memory = 1024
+      }
+    }
+  }
+}