A Raspberry Pi homelab managed entirely through Ansible. Manual intervention is limited to the one-time bootstrap (Phase 1); all subsequent deployments and updates are automated.
See BOOTSTRAP.md to get started.
The homelab is organised into four node roles. Full hardware and service details are in NODES.md.
| Node | Role | Status |
|---|---|---|
homelab-edge |
Internet edge, DNS, ingress, Ansible control | Active |
homelab-observe |
Monitoring, logging, alerting | Active |
homelab-svc-01 |
Orchestration, databases (Camunda stack) | Active |
homelab-svc-02 |
User-facing application workloads | Planned |
homelab-svc-03 |
Media server (Jellyfin) | Future |
Phase 1: PC ────────────────► homelab-edge (bootstrap from PC, one-time)
Phase 2: homelab-edge ──────► itself (edge self-deploys its services)
Phase 3: homelab-edge ──────► other nodes (edge deploys observe + svc nodes)
Phase 4: GitHub push ──► n8n/Camunda ──► edge (automated deploys via automation endpoint)
After Phase 1, all deployments are driven by Ansible playbooks running on the edge node. On push to master, a minimal
GitHub Actions workflow POSTs to an n8n or Camunda endpoint, which SSHes to the edge as the deploy user and runs
ansible-playbook directly. The edge is the sole Ansible control node — GitHub never connects directly to any homelab node.
homelab/
README.md # This file
BOOTSTRAP.md # Step-by-step bootstrap guide (Phases 1–4)
NODES.md # Hardware and service details per node
inventories/
bootstrap.ini # Phase 1: bootstrap edge from PC
prod.yml # All nodes, production groups
group_vars/
all/
main.yml # Common: timezone, NTP, users, node IPs (ip_*, lan_subnet), SSH port
vault.yml # Ansible Vault: all secrets (gitignored)
edge.yml # Edge-specific: Tailscale subnet, firewall rules
observe.yml # Observability: retention, alert endpoints
svc.yml # Service nodes: Docker daemon config, resource limits
host_vars/
homelab-edge.yml # Cloudflare tunnel ID, Pi-hole upstream
homelab-observe.yml # Prometheus scrape targets, Loki config
homelab-svc-01.yml # Camunda: Elasticsearch heap, DB config
homelab-svc-02.yml # GreenTechHub: Django config, Redis
homelab-svc-03.yml # Jellyfin: media paths, transcoding settings
playbooks/
bootstrap_edge.yml # Phase 1: initial edge setup
bootstrap_node.yml # Phase 3: bootstrap a new node
deploy_edge.yml # Phase 2: deploy edge services
deploy_observe.yml # Phase 3: deploy monitoring stack
deploy_svc.yml # Phase 3: deploy service workloads
update_all.yml # OS updates, Docker image pulls
backup.yml # Database backups, config exports
rollback.yml # Revert to previous Docker images
healthcheck.yml # Verify all services are healthy
roles/
base_hardening/ # SSH hardening, sysctl
docker/ # Docker install and daemon config
docker_compose/ # Docker Compose plugin
tailscale/ # Tailscale install and config
firewall/ # ufw rules
fail2ban/ # SSH + HTTP jails (templates/jail.conf.j2)
node_exporter/ # Prometheus node exporter
cadvisor/ # Container metrics (svc nodes)
alloy/ # Grafana Alloy (logs → Loki)
users/ # System user creation (admin, homelab, deploy)
edge_services/ # cloudflared, Caddy (LAN reverse proxy), Pi-hole, Unbound
observe_services/ # Prometheus, Loki, Grafana, Alertmanager, ntfy, Uptime Kuma
camunda/ # Camunda 8 + n8n + discord-gateway env rendering
# Docker Compose stacks (at repo root, deployed per-node)
docker-compose.edge.yml # cloudflared, Caddy, Pi-hole, Unbound, node-exporter, pihole-exporter, portainer-agent
docker-compose.observe.yml # Prometheus, Loki, Grafana, Alertmanager, ntfy, Uptime Kuma, Portainer
docker-compose.svc01.yml # Camunda 8, Elasticsearch, n8n, discord-gateway, Portainer Agent
# Jinja2 templates live inside each role at roles/<role>/templates/
# Key templates:
# roles/alloy/templates/config.alloy.j2
# roles/edge_services/templates/cloudflared/config.yml.j2
# roles/edge_services/templates/pihole/custom.list.j2
# roles/observe_services/templates/prometheus/prometheus.yml.j2
# roles/observe_services/templates/alertmanager/alertmanager.yml.j2
# roles/observe_services/templates/loki/loki.yml.j2
# roles/observe_services/templates/ntfy/server.yml.j2
# roles/observe_services/templates/grafana/datasources.yml.j2
# roles/fail2ban/templates/jail.conf.j2
# roles/camunda/templates/{camunda,n8n,discord_gateway}/{env,env.secrets}.j2
secrets/
vault.yml # Ansible Vault: passwords, API keys, certificates
vault.yml.example # Template — commit this, never vault.yml itself
scripts/
backup_databases.sh # Thin wrapper around playbooks/backup.yml
test_connectivity.sh # LAN ping, Tailscale, SSH, HTTP endpoints, DNS checks
docs/
NETWORK.md # IP assignments, firewall rules, Tailscale ACLs
MONITORING.md # Grafana dashboard guide, alert tuning
TROUBLESHOOTING.md # Common issues and recovery procedures
.github/
workflows/
deploy.yml # On push to master: POST to n8n/Camunda deploy endpoint
test.yml # Ansible lint, YAML validation on pull request
IP assignments, firewall rules, DNS, Tailscale ACLs, and traffic flow diagrams are in docs/NETWORK.md.
Summary:
| Node | IP var | Tailscale IP |
|---|---|---|
homelab-edge |
ip_edge |
100.x.x.1 |
homelab-observe |
ip_observe |
100.x.x.2 |
homelab-svc-01 |
ip_svc_01 |
100.x.x.3 |
homelab-svc-02 |
ip_svc_02 |
100.x.x.4 |
homelab-svc-03 |
ip_svc_03 |
100.x.x.5 |
IP values are defined in
inventories/group_vars/all/main.yml(Network section).
- Internal DNS served by Pi-hole on
homelab-edge(LAN only, port 53 firewalled) - External traffic enters via Cloudflare Tunnel — no router port forwards required
- Tailscale is installed on every node individually; each node remains accessible over VPN even if
homelab-edgeis down
Three separate system users are created by the bootstrap playbook:
| User | Purpose | SSH Key | Sudo |
|---|---|---|---|
admin |
Manual maintenance | homelab-edge |
Yes (password) |
homelab |
Ansible automation | homelab |
Yes (passwordless) |
deploy |
Webhook-triggered deploys | deploy |
/usr/bin/ansible-playbook only |
Separation ensures a webhook or script compromise cannot escalate beyond running approved playbooks.
- Network perimeter — no open router ports; Cloudflare Tunnel handles all external ingress; UPnP disabled
- Edge node — fail2ban (SSH: 3 failures; HTTP: 10 failures); Pi-hole blocks malicious domains; Cloudflare Tunnel rate-limits and filters before traffic reaches the homelab
- All nodes — SSH key-only, no root login, no password auth; ufw default-deny inbound; unattended security updates
- Secrets — all credentials in Ansible Vault, encrypted at rest; no hardcoded values in playbooks or templates
- Tailscale VPN — per-node ACLs restrict inter-node communication; admin access requires Tailscale login with MFA
- Monitoring — Alertmanager notifies on failed SSH attempts; Uptime Kuma alerts on service downtime
Secrets are stored in inventories/group_vars/all/vault.yml. See inventories/group_vars/all/vault.yml.example for required variable names.
| Data | Method | Destination | Frequency |
|---|---|---|---|
| PostgreSQL databases | pg_dump |
/mnt/nvme/backups |
Daily |
| Elasticsearch snapshots | Snapshot API | /mnt/nvme/elasticsearch |
Daily |
| All configuration | Git (Ansible repo) | GitHub | On commit |
| Grafana dashboards | JSON in repo | GitHub | On commit |
| Pi-hole config | custom.list in repo |
GitHub | On commit |
Run backups on demand:
ansible-playbook playbooks/backup.ymlFor disaster recovery procedures see docs/TROUBLESHOOTING.md.
Full stack reference (Prometheus, Loki, Grafana, Alertmanager, Uptime Kuma) is in docs/MONITORING.md.
Alert channels:
| Severity | Condition | Channel |
|---|---|---|
| Critical | Node down, disk < 5%, crash loop | Discord / SMS |
| Warning | CPU/memory > 80%, disk < 10%, SSL expiry | Discord |
| Info | Updates available, backup done, new Tailscale node | Discord |
Key service endpoints (internal):
| Service | URL |
|---|---|
| Grafana | http://grafana.homelab.local:3000 |
| Prometheus | http://prometheus.homelab.local:9090 |
| Alertmanager | http://alertmanager.homelab.local:9093 |
| Uptime Kuma | http://uptime.homelab.local:3001 |
| Portainer | http://portainer.homelab.local:9000 |