Last updated: 2026-02-01
It uses Ansible for automated provisioning, Nomad for cluster orchestration, and a state-of-the-art AI stack to create a responsive, streaming, and embodied voice agent. For a detailed technical description of the system's layers, see the Holistic Project Architecture document.
- Cluster Nodes: 3 to 20 legacy desktop computers (Intel Core 2 Duo or similar, 8GB RAM, SSD recommended).
- Control Node: A machine to run Ansible for provisioning.
- Recommended OS: Debian Trixie, minimal install with SSH server.
A brief overview of the key directories in this repository:
- /ansible: Contains all Ansible playbooks, roles, and templates for provisioning and deploying the entire system.
- /ansible/roles: Individual, reusable components for managing specific parts of the system (e.g.,
nomad,consul,pipecatapp). - /ansible/roles/pipecatapp/files: The core Python source code for the conversational agent, including
app.py,memory.py, and thetoolsdirectory.
- /ansible/roles: Individual, reusable components for managing specific parts of the system (e.g.,
- /pipecatapp/workflows: Contains YAML definitions for the agent's behavior and thought processes (e.g.,
default_agent_loop.yaml). - /verification: Scripts and tools for verifying the system's frontend and functionality.
- /prompt_engineering: Scripts and tools for evaluating and improving the AI's prompts using evolutionary algorithms.
- /reflection: Scripts related to the agent's self-reflection and self-healing capabilities.
- /scripts: Utility and linting scripts for maintaining code quality.
- /testing: Contains unit and integration tests for the various components of the project.
- /*.yaml: Top-level Ansible playbook files (e.g.,
playbook.yaml,heal_cluster.yaml). - /group_vars: Ansible configuration files that apply to all hosts, such as
all.yamlandmodels.yaml.
Setting up a new cluster involves two main methods: a one-time manual setup for the first node, and a fully automated setup for all subsequent nodes.
The first node in your cluster requires a manual OS installation. This node will later be configured by Ansible to act as the PXE/iPXE boot server for all other nodes.
- Install Debian Trixie: Perform a standard, minimal installation of Debian Trixie with an SSH server.
- Clone this repository:
git clone <repo_url> - Configure Initial Settings: Enter the
initial-setupdirectory and edit thesetup.conffile. You must provide the machine's desiredHOSTNAME, a static IP address, and theCONTROL_NODE_IP(which should be the static IP of this same machine, as it will become the control node). - Run Setup Script: Execute the script with root privileges:
sudo bash setup.sh - Reboot.
After rebooting, this node is ready for Ansible provisioning (see Section 4).
It should be designated as both a controller_node and your pxe_server in
the Ansible inventory.
Once your first node has been provisioned by Ansible and the pxe_server role has been applied to it, you can automatically install Debian on all other bare-metal machines in your cluster.
This system uses an advanced iPXE-over-HTTP method that is significantly faster and more reliable than traditional PXE. For detailed instructions on how to apply the Ansible role and prepare the client machines for network booting, see the iPXE Boot Server Setup Guide.
For development, testing, or bootstrapping the very first node of a new cluster, you can use the provided bootstrap script. This is the recommended method for getting started.
-
On your control node, install Git:
sudo apt install git -y -
Clone this repository.
-
Run the Bootstrap Script: This script is a powerful wrapper around Ansible that handles all necessary steps to configure the local machine as a fully-functional, standalone agent, a cluster controller, or a new worker node.
Basic Usage:
./bootstrap.sh
- What this does: By default, the script runs the complete end-to-end process to configure the local machine as a standalone agent and control node. It invokes a series of Ansible playbooks that install and configure all necessary system components (Consul, Nomad, Docker) and deploy the AI agent services.
- You will be prompted for your
sudopassword, as the script needs administrative privileges to install and configure software.
Common Flags for Customizing the Bootstrap Process:
You can control the behavior of the bootstrap script with the following flags:
--role <role>: Specify the role for the node.all(default): Full setup for a standalone control node.controller: Sets up only the core infrastructure services (Consul, Nomad, etc.).worker: Configures the node as a worker and requires--controller-ip.
--controller-ip <ip>: The IP address of the main controller node. Required when--roleisworker.--user <user>: Specify the target user for Ansible (default:pipecatapp).--tags <tag1,tag2>: Run only specific parts of the Ansible playbook (e.g.,--tags nomadwould only run the Nomad configuration tasks).--external-model-server: Skips the download and build steps for large language models. This is ideal for development or if you are using a remote model server.--purge-jobs: Stops and purges all running Nomad jobs before starting the bootstrap process, ensuring a clean deployment.--leave-services-running: Do not clean up Nomad and Consul data on startup (useful for restarts without state loss).--system-cleanup: Use with caution. Aggressively cleans Docker resources, Apt cache, and logs to free up disk space on the host machine.--clean: Use with caution. This will permanently delete all untracked files in the repository (git clean -fdx), restoring it to a pristine state.--debug: Enables verbose Ansible logging (-vvvv) and saves the full output toplaybook_output.log.--verbose [level]: Set verbosity level (0-4). Default 0, or 3 if flag is used without value.--continue: If a previous bootstrap run failed, this flag will resume the process from the last successfully completed playbook, saving significant time.--benchmark: Run benchmark tests during deployment.--deploy-docker: Deploy the pipecat application using Docker (Default).--run-local: Deploy the pipecat application using localraw_exec(useful for debugging without Docker rebuilds).--container: Run the entire infrastructure inside a single large container (experimental).--home-assistant-debug: Enable debug mode for Home Assistant integrations.--watch <target>: Pause for inspection after the specified target (task/role) completes.
This single node is now ready to be used as a standalone conversational AI agent. It can also serve as the primary "seed" node for a larger cluster. To expand your cluster, see the advanced guide below.
If you are setting up a multi-node cluster, you will need to work with the Ansible inventory directly.
-
Configure Initial Inventory (
inventory.yaml): Edit theinventory.yamlfile to define your initial controller nodes. While new worker nodes will be added to the cluster automatically, you must define the initial seed nodes for the control plane here.- Create a host group named
controller_nodes. This group must contain at least one node that will act as the primary control node and Nomad server. - Create an empty host group named
worker_nodes. This group will be populated automatically as new nodes join the cluster.
- Create a host group named
-
Run the Main Playbook: Run the following command from the root of this repository. This will configure the initial control node(s) and prepare the cluster for auto-expansion.
ansible-playbook -i inventory.yaml playbook.yaml --ask-become-pass
--ask-become-pass: This flag is important. It will prompt you for yoursudopassword, which Ansible needs to perform administrative tasks.- What this does: This playbook not only aconfigures the cluster services (Consul, Nomad, etc.) but also automatically bootstraps the primary control node into a fully autonomous AI agent by deploying the necessary AI services.
This cluster is designed for resilience and scalability. As your needs grow, you may need to add more controller nodes to the control plane for higher availability. This process is fully automated.
To promote an existing worker node to a controller:
-
Ensure the worker is part of the cluster: The node you wish to promote must already be a provisioned worker and visible in
nomad node status. -
Run the promotion playbook:
ansible-playbook playbooks/promote_controller.yaml
-
Enter the hostname: You will be prompted to enter the exact hostname of the worker node you want to promote (e.g.,
worker1).
The playbook will handle everything:
- It safely modifies the
inventory.yamlfile to move the node from theworkersgroup to thecontroller_nodesgroup. - It stops the services on the target node, cleans up the old worker-specific state, and re-runs the
consulandnomadconfiguration roles to re-provision it as a server. - The node will automatically rejoin the cluster as a controller, strengthening the control plane.
The core of this application is the TwinService, which now orchestrates the agent's behavior using a flexible Workflow Engine. Instead of a hardcoded logic loop, the agent's thought process is defined in declarative YAML files (e.g., pipecatapp/workflows/default_agent_loop.yaml).
The agent uses a graph-based workflow engine where nodes represent steps in the thought process (e.g., "Summarize", "Reason", "Execute Tool").
- Default Workflow: A tiered agent loop that summarizes input and generates a response using a mixture of expert models.
- Extensibility: You can define custom workflows to create agents with different capabilities (e.g., a "Researcher" workflow that prioritizes browsing and summarization).
- Short-Term: Remembers the last 10 conversational turns in a simple list.
- Long-Term: Uses a FAISS vector store or a remote PMM Memory Service to remember key facts. It performs a semantic search over this memory to retrieve relevant context for new conversations.
The agent is capable of using a wide variety of tools to interact with the world. While the default workflow is a simple conversational loop, advanced workflows can leverage the following tools:
These features are integrated directly into the TwinService or prompt system:
- Vision: Uses a YOLOv8 or Moondream model to provide real-time descriptions of the webcam feed.
- Expert Routing: Dynamically routes queries to specialized expert models (e.g., Coding, Math) via
route_to_expert.
The following tools are available in the codebase (pipecatapp/tools/):
- Ansible (
ansible): Runs Ansible playbooks to manage the cluster. - Archivist (
archivist): Performs deep research on the agent's long-term memory. - Claude Clone (
claude_clone): A tool for interacting with a Claude-like model. - Code Runner (
code_runner): Executes Python code in a secure, sandboxed environment. - Container Registry (
container_registry): Search for container images and tags in the Docker Registry. - Council (
council): Convenes a council of AI experts to deliberate on a query. - Dependency Scanner (
dependency_scanner): Scans Python packages for vulnerabilities using the OSV database. - Desktop Control (
desktop_control): Provides full control over the desktop environment (screenshots, mouse/keyboard). - Experiment (
experiment): Orchestrates A/B testing or parallel experiments for code generation. - File Editor (
file_editor): Reads, writes, and patches files in the codebase. - Final Answer (
final_answer): A tool to provide a final answer to the user. - Git (
git): Interacts with Git repositories. - Home Assistant (
ha): Controls smart home devices via Home Assistant. - LLxprt Code (
llxprt_code): A specialized tool for code-related tasks. - Master Control Program (
mcp): Provides agent introspection and self-control. - Open Workers (
open_workers): Manages and interacts with open worker agents. - OpenClaw (
openclaw): Send messages via OpenClaw Gateway to various channels (WhatsApp, Telegram, etc.). - OpenCode (
opencode): Interface for the OpenCode tool. - Orchestrator (
orchestrator): Dispatches high-priority, complex jobs to the cluster. - Planner (
planner): Plans complex tasks and executes them. - Power (
power): Controls the cluster's power management policies. - Project Mapper (
project_mapper): Scans the codebase to generate a project structure map. - Prompt Improver (
prompt_improver): A tool for improving prompts. - RAG (
rag): Searches the project's documentation to answer questions. - Search (
search): Search the codebase for text patterns or file names. - Shell (
shell): Executes shell commands (uses a persistent tmux session). - Smol Agent (
smol_agent_computer): A tool for creating small, specialized agents. - Spec Loader (
spec_loader): Clones external Git repositories (docs, specs) and ingests them into the agent's context. - SSH (
ssh): Executes commands on remote machines. - Submit Solution (
submit_solution): Allows a Worker Agent to submit a code solution or artifact to be parsed by the ExperimentTool/Judge. - Summarizer (
summarizer): Summarizes conversation history. - Swarm (
swarm): Spawns multiple worker agents to perform tasks in parallel. - Term Everything (
term_everything): Provides a terminal interface for interacting with the system. - VR (
vr): Tools for Virtual Reality interactions. - Web Browser (
web_browser): Enables web navigation and content interaction.
Note on Implementation History: Previous versions of the agent relied on a hardcoded "Router" agent with a static list of tools. The current system has evolved to a dynamic, workflow-driven architecture (
TwinService+WorkflowRunner), enabling more complex and varied agent behaviors.
The agent is designed to function as a "Mixture of Experts." The primary
pipecat agent acts as a router, classifying the user's query and routing it to
a specialized backend expert if appropriate.
- How it Works: The
TwinService(or aSimpleLLMNodein the workflow) classifies the user's query. If it determines the query is best handled by a specialist (e.g., a 'coding' expert), it routes the request to that expert service via Consul. - Configuration: Deploying these specialized experts is done using the
deploy_expert.yamlAnsible playbook. For detailed instructions, see the Advanced AI Service Deployment section below.
The personality and instructions for the main router agent are defined in ansible/roles/pipecatapp/files/prompts/router.txt. You can edit this file to customize the behavior of the main agent. Expert agents are configured via the group_vars/models.yaml file, where you can define the models they use.
There are two primary ways to interact with the conversational agent: the web interface and the Gemini CLI extension.
Navigate to the IP address of any node in your cluster on port 8000 (e.g., http://192.168.1.101:8000). The web UI provides real-time conversation logs, a request-approval interface, and the ability to save and load the agent's memory state.
For command-line users, a Gemini CLI extension is provided to send messages directly to the agent.
-
Install the Gemini CLI:
npm install -g @google/gemini-cli
-
Navigate to the extension directory:
cd pipecat-agent-extension -
Install dependencies and build the extension:
npm install npm run build
-
Link the extension to your Gemini CLI installation:
gemini extensions link .
Once the extension is linked, you can use the custom /pipecat:send command to send a message to the agent:
gemini /pipecat:send "Your message here"Example:
gemini /pipecat:send "Can you write a python script to list files in a directory?"The agent will process this message as if you had typed it in the web UI.
In addition to the agent's interface, you can access the dashboards for the underlying infrastructure services.
-
URL:
http://<node_ip>:8500 -
Login: Access requires the SecretID (management token).
-
Retrieving the Token: Run this command on your controller node:
sudo cat /etc/consul.d/management_token
- URL:
http://<node_ip>:4646
The system is designed to be self-bootstrapping. The bootstrap.sh script (or the main playbook.yaml) handles the deployment of the core AI services on the primary control node. This includes a default instance of the llama-expert job and the pipecat voice agent.
If a job has been stopped, or you just want to verify that everything is running as it should be, you now use your new, lightweight playbook. It will skip all the system setup and only manage the Nomad jobs.
ansible-playbook playbooks/heal_cluster.yamlIf you make a change to a job file or need to restart the services from a clean state, it's best to purge the old jobs before running the start script again.
nomad job stop -purge llamacpp-rpc
nomad job stop -purge pipecat-appThe true power of this architecture is the ability to deploy multiple,
specialized AI experts that the main pipecat agent can route queries to. With
the new unified llama-expert.nomad job template, deploying a new expert is
handled through a dedicated Ansible playbook.
-
Define a Model List for Your Expert: First, open
group_vars/models.yamland create a new list of models for your expert. For example, to create acreative-writingexpert, you could add:creative_writing_models: - name: "phi-3-mini-instruct" # ... other model details
-
Deploy the Expert with Ansible: Use the
playbooks/deploy_expert.yamlplaybook to render the Nomad job with your custom parameters and launch it. You pass variables on the command line using the-eflag.-
Example: Deploying a
creative-writingexpert to thecreativenamespace:ansible-playbook playbooks/deploy_expert.yaml \ -e "job_name=creative-expert service_name=llama-api-creative namespace=creative model_list={{ creative_writing_models }} worker_count=2"
-
The TwinService in the pipecatapp will automatically discover any service
registered in Consul with the llama-api- prefix and make it available for
routing.
To support running large models on legacy hardware with limited RAM, the system supports Split Inference.
- How it Works: The
expertjob can be configured to offload computation torpc-serverproviders running on worker nodes. - Configuration: When deploying an expert, the system automatically discovers available
rpc-providerservices via Consul and passes them to thellama-serverusing the--rpcargument. This allows the model's layers to be split across multiple machines, aggregating their memory and compute power.
To optimize resource usage on legacy hardware, this project includes an intelligent power management system.
- How it Works: A Python service,
power_agent.py, uses an eBPF program (traffic_monitor.c) to monitor network traffic to specific services at the kernel level with minimal overhead. - Sleep/Wake: If a monitored service is idle for a configurable period, the power agent automatically stops the corresponding Nomad job. When new traffic is detected, the agent restarts the job.
- Configuration: The agent can configure this behavior using the
power.set_idle_thresholdtool.
This project includes a web-based dashboard for real-time display and debugging. To access it, navigate to the IP address of any node in your cluster on port 8000 (e.g., http://192.168.1.101:8000). The UI provides:
- Real-time conversation logs.
- A request-approval interface for sensitive tool actions.
- The ability to save and load the agent's memory state.
- Check Cluster Status:
nomad node status - Check Job Status:
nomad job status - View Logs:
nomad alloc logs <allocation_id>or use the Mission Control Web UI.
A dedicated health check job exists to verify the status of all running LLM experts. This provides a quick way to ensure the entire cluster is operational.
- Run the check:
ansible-playbook playbooks/run_health_check.yaml - View results:
nomad job logs health-check - Manual Test Scripts: A set of scripts for manual testing of individual components is available in the
tests/scripts/directory.
This project uses a suite of linters to ensure code quality and consistency. For detailed instructions on how to install the development dependencies and run the checks, please see the Linting Documentation.
To run all linters, use the following command:
npm run lint- Model Selection: The
llama-expert.nomadjob is configured via Ansible variables ingroup_vars/models.yaml. You can define different model lists for different experts. - Network: Wired gigabit ethernet is strongly recommended over Wi-Fi for reduced latency.
- VAD Tuning: The
RealtimeSTTsensitivity can be tuned inapp.pyfor better performance in noisy environments. - STT/TTS Service Selection: You can choose which Speech-to-Text and Text-to-Speech services to use by setting environment variables in the
pipecatapp.nomadjob file.
This project includes two types of benchmarks.
Measures the end-to-end latency of a live conversation. Enable it by setting BENCHMARK_MODE = "true" in the env section of the pipecatapp.nomad job file. Results are printed to the job logs.
Uses llama-bench to measure the raw inference speed (tokens/sec) of the deployed LLM backend. Run the benchmark.nomad job to test the performance of the default model.
nomad job run /opt/nomad/jobs/benchmark.nomadView results in the job logs: nomad job logs llama-benchmark
For advanced users, this project includes a workflow for automatically improving the agent's core prompt using evolutionary algorithms. See prompt_engineering/PROMPT_ENGINEERING.md for details.
This section outlines the major feature enhancements and maintenance tasks planned for the future.
- Implement Graceful LLM Failover: Enhance the
llama-expert.nomadjob to include a final, lightweight fallback model to ensure the expert service always starts with a basic capability. - Re-evaluate Consul Connect Service Mesh: Once the core system is stable, create a new feature branch to attempt to re-enable
sidecar_servicein the Nomad job files and document the process and performance overhead. - Add Pre-flight System Health Checks: Create a new Ansible role to perform non-destructive checks (filesystem writability, disk space, network connection) at the beginning of the main playbook.
- Investigate Advanced Power Management: Research and prototype a more advanced power management system using Wake-on-LAN, triggered by the
power_agent.py. - Security Hardening:
- Remove passwordless sudo and require a password for the
target_user. - Audit all services to ensure they run as dedicated, non-privileged users.
- Remove passwordless sudo and require a password for the
- Monitoring and Observability: Deploy a monitoring stack like Prometheus and Grafana to collect and visualize metrics from Nomad, Consul, and the application itself.
- Web UI/UX Improvements:
- Replace ASCII art with a more dynamic animated character.
- Add a "Clear Terminal" button to the UI.
- Improve the status display to be more readable than a raw JSON dump.
- Bolster Automated Testing:
- Implement Ansible Molecule tests for critical roles.
- Expand end-to-end tests in
e2e-tests.yamlto verify core agent functions. - Increase unit test coverage for Python tools.
For solutions to common issues, such as failing Nomad service checks or deployment errors, please refer to the Troubleshooting Guide.
This project is licensed under the GNU General Public License v3.0.