Production-ready CI/CD pipeline for the Cortex automation system with GitHub Actions and ArgoCD GitOps.
This guide covers the complete CI/CD setup for Cortex, including:
- Continuous Integration: Automated testing, linting, building, and security scanning
- Continuous Deployment: GitOps-based deployment to Kubernetes using ArgoCD
- Release Management: Semantic versioning and automated changelog generation
- Security: Comprehensive security scanning and compliance checks
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ GitHub │─────▶│ GitHub │─────▶│ GHCR │─────▶│ ArgoCD │
│ Repository │ │ Actions │ │ Registry │ │ GitOps │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Security │ │ Kubernetes │
│ Scanning │ │ Cluster │
└─────────────┘ └─────────────┘
| File | Purpose | Trigger |
|---|---|---|
.github/workflows/ci.yaml |
Main CI pipeline | Push, PR |
.github/workflows/cd.yaml |
Deployment pipeline | Main branch, tags |
.github/workflows/release.yaml |
Semantic versioning | Main branch |
.github/workflows/security-scan.yaml |
Security scanning | Daily, PR |
.github/workflows/pr-check.yaml |
PR validation | PR events |
| File | Purpose |
|---|---|
deploy/argocd/project.yaml |
ArgoCD project with RBAC |
deploy/argocd/application.yaml |
Static applications |
deploy/argocd/applicationset.yaml |
Dynamic MCP server deployment |
deploy/argocd/README.md |
ArgoCD documentation |
| File | Purpose |
|---|---|
.github/labeler.yml |
Auto-labeling for PRs |
.github/PULL_REQUEST_TEMPLATE.md |
PR template |
deploy/README.md |
Deployment guide |
Add these secrets in GitHub repository settings:
# Container Registry
GITHUB_TOKEN # Auto-provided by GitHub
# Kubernetes Access
KUBECONFIG_STAGING # Base64-encoded kubeconfig for staging
KUBECONFIG_PRODUCTION # Base64-encoded kubeconfig for production
# ArgoCD Access
ARGOCD_SERVER # ArgoCD server URL (e.g., argocd.example.com)
ARGOCD_USERNAME # ArgoCD admin username
ARGOCD_PASSWORD # ArgoCD admin password
ARGOCD_SERVER_PROD # Production ArgoCD server (if different)
# Optional
SLACK_WEBHOOK_URL # For notifications
CODECOV_TOKEN # For code coverage# Encode kubeconfig for staging
cat ~/.kube/config-staging | base64 | pbcopy
# Encode kubeconfig for production
cat ~/.kube/config-production | base64 | pbcopy# Create ArgoCD namespace
kubectl create namespace argocd
# Install ArgoCD
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Wait for ArgoCD to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=argocd-server -n argocd --timeout=300s
# Get admin password
ARGOCD_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
echo "ArgoCD admin password: $ARGOCD_PASSWORD"
# Port forward to access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
# Access ArgoCD UI
open https://localhost:8080# Login to ArgoCD CLI
argocd login localhost:8080 --username admin --password "$ARGOCD_PASSWORD" --insecure
# Apply Cortex project
kubectl apply -f deploy/argocd/project.yaml
# Deploy applications
kubectl apply -f deploy/argocd/application.yaml
# Deploy ApplicationSet for MCP servers
kubectl apply -f deploy/argocd/applicationset.yaml
# Verify deployment
argocd app list
argocd app get cortex-dashboard# Create a test branch
git checkout -b feature/test-cicd
# Make a change
echo "# Test CI/CD" >> README.md
# Commit and push
git add README.md
git commit -m "feat: test CI/CD pipeline"
git push origin feature/test-cicd
# Create PR via GitHub CLI
gh pr create --title "feat: test CI/CD pipeline" --body "Testing CI/CD setup"
# Watch workflow execution
gh run watchTriggers: Push to main/develop/feature/bugfix branches, PRs
Jobs:
-
Lint & Format Check (10 min)
- ESLint for TypeScript/JavaScript
- Prettier formatting validation
- ShellCheck for bash scripts
-
TypeScript Type Check (10 min)
- Full type validation across workspace
- Interface compatibility checks
-
Test Suite (15 min)
- Matrix: Node.js 18.x, 20.x
- Unit + integration tests
- Coverage reporting to Codecov
- Threshold: 80% coverage
-
Build MCP Servers (15 min)
- Matrix: 5 MCP packages
- Parallel builds
- Artifact upload
-
Build Docker Images (20 min)
- Matrix: 5 MCP packages
- BuildKit caching
- Multi-platform support
-
Validate Coordination Files (10 min)
- JSON validation
- YAML validation
- Schema compliance
-
Build Dashboard (15 min)
- Astro SSG build
- Static asset optimization
# pnpm store cache
- uses: actions/cache@v4
with:
path: ~/.pnpm-store
key: ${{ runner.os }}-pnpm-${{ hashFiles('**/pnpm-lock.yaml') }}
# Docker BuildKit cache
- uses: docker/build-push-action@v5
with:
cache-from: type=gha
cache-to: type=gha,mode=maxTriggers: Push to main, tags matching v*.*.*, manual dispatch
Jobs:
-
Build & Push Images (30 min)
- Matrix: All MCP servers
- Push to ghcr.io
- Image signing with Cosign
- Tags: latest, sha, version
-
Update ArgoCD Manifests (10 min)
- Update image tags in Git
- Automated commit & push
-
Deploy to Staging (15 min)
- ArgoCD sync
- Health check verification
- Smoke tests
-
Deploy to Production (20 min)
- Requires: Tag
v*.*.* - Canary rollout strategy
- Backup before deployment
- Comprehensive testing
- Auto-rollback on failure
- Requires: Tag
ghcr.io/ryandahlberg/cortex/mcp-k8s-orchestrator:
- latest (main branch)
- main-abc123f (commit SHA)
- v1.2.3 (semantic version)
- v1.2 (minor version)
- v1 (major version)
Triggers: Push to main, manual dispatch
Conventional Commits:
| Type | Release | Example |
|---|---|---|
feat |
Minor (0.1.0) | feat: add new feature |
fix |
Patch (0.0.1) | fix: resolve bug |
perf |
Patch (0.0.1) | perf: optimize query |
feat! |
Major (1.0.0) | feat!: breaking change |
Release Process:
- Analyze commits since last release
- Determine next version
- Generate changelog
- Create GitHub release
- Build release artifacts
- Upload to release
- Create announcement issue
Triggers: Push, PR, daily at 2 AM UTC
Scans:
-
Dependency Scan
- pnpm audit
- npm audit
- Threshold: 0 critical, ≤5 high
-
Container Image Scan
- Trivy vulnerability scanning
- SARIF upload to GitHub Security
-
SAST with CodeQL
- JavaScript analysis
- Python analysis
- Security query pack
-
Secret Scanning
- Gitleaks
- TruffleHog
- Historical commits
-
License Compliance
- License checker
- GPL/AGPL detection
-
IaC Security
- Trivy IaC scanner
- Checkov (Kubernetes, Dockerfile)
-
SBOM Generation
- CycloneDX format
- SPDX format
Triggers: PR opened, synchronized, reopened
Validations:
-
PR Metadata
- Conventional commit title
- Description length (≥50 chars)
- Branch naming convention
-
Commit Lint
- All commits validated
- Conventional format required
-
Code Quality
- Linting
- Formatting
- Type checking
- console.log detection
-
Test Coverage
- Threshold: 80%
- PR comment with report
-
Changed Files Analysis
- Impact assessment
- Component identification
-
Security Check
- Dependency audit
- Secret scanning
-
Build Verification
- All packages build successfully
-
PR Size Check
- Auto-labeling (XS, S, M, L, XL)
- Large PR warnings
type(scope): description
Examples:
✅ feat(mcp-k8s): add pod scaling support
✅ fix(dashboard): resolve memory leak
✅ docs(readme): update installation guide
❌ Add new feature
❌ Fixed bug
type/description
Examples:
✅ feature/add-monitoring
✅ bugfix/fix-memory-leak
✅ hotfix/critical-security-patch
❌ my-feature
❌ fix
cortex (AppProject)
├── cortex-dashboard (Application)
├── cortex-monitoring (Application)
├── cortex-mcp-orchestrator (Application)
└── cortex-mcp-servers (ApplicationSet)
├── mcp-k8s-orchestrator (Generated)
├── mcp-n8n-workflow (Generated)
├── mcp-talos-node (Generated)
├── mcp-s3-storage (Generated)
└── mcp-postgres-data (Generated)
All applications use:
- Automated sync: Changes from Git applied automatically
- Self-heal: Reverts manual cluster changes
- Prune: Removes resources deleted from Git
- Retry: Exponential backoff on failures
Wave 0: Monitoring (Prometheus, Grafana)
Wave 1: Dashboard
Wave 2: MCP Servers
Deploys MCP servers with custom config:
elements:
- name: mcp-k8s-orchestrator
replicas: 3
priority: highMulti-environment deployment:
environments × mcp-servers = applicationsAuto-discovery of new MCP servers:
directories:
- path: packages/mcp-*All services expose Prometheus metrics at /metrics:
# Port forward and check metrics
kubectl port-forward -n cortex-mcp svc/mcp-k8s-orchestrator 3000:3000
curl http://localhost:3000/metricsPre-configured dashboards:
-
Cortex Overview
- System health
- Request rates
- Error rates
-
MCP Servers Performance
- Per-server metrics
- Resource usage
- Latency percentiles
-
Resource Usage
- CPU/Memory consumption
- Pod scaling events
- HPA metrics
Alert rules configured for:
- High error rates (>1%)
- High latency (p95 >500ms)
- Pod crashes
- Resource saturation
- Health check failures
# View workflow runs
gh run list --workflow=ci.yaml
# View specific run
gh run view <run-id>
# Download logs
gh run download <run-id>
# Re-run failed jobs
gh run rerun <run-id> --failed# Check ArgoCD app status
argocd app get cortex-dashboard
# View sync errors
argocd app logs cortex-dashboard
# Force sync
argocd app sync cortex-dashboard --force
# Rollback
argocd app rollback cortex-dashboard# Create image pull secret
kubectl create secret docker-registry ghcr-secret \
--docker-server=ghcr.io \
--docker-username=<username> \
--docker-password=<token> \
-n cortex-mcp
# Verify secret
kubectl get secret ghcr-secret -n cortex-mcp -o yaml# Refresh application
argocd app refresh cortex-dashboard
# Hard refresh (ignore cache)
argocd app refresh cortex-dashboard --hard
# View diff
argocd app diff cortex-dashboard
# Check sync status
argocd app sync-status cortex-dashboardUse conventional commits:
git commit -m "feat(mcp-k8s): add pod scaling"
git commit -m "fix(dashboard): resolve memory leak"
git commit -m "docs: update deployment guide"- Keep PRs small (<600 lines changed)
- Include tests for new features
- Update documentation
- Add meaningful description
main ← Production releases
└─ develop ← Integration branch
├─ feature/xxx
├─ bugfix/xxx
└─ hotfix/xxx
# 1. Merge features to develop
git checkout develop
git merge feature/new-feature
# 2. Test on staging
git push origin develop
# 3. Create release PR to main
gh pr create --base main --head develop
# 4. Tag release
git tag -a v1.2.3 -m "Release v1.2.3"
git push origin v1.2.3
# 5. ArgoCD deploys to production# Via ArgoCD
argocd app rollback cortex-dashboard
# Via Git revert
git revert <commit-sha>
git push origin main
# Via kubectl (emergency)
kubectl rollout undo deployment/cortex-dashboard -n cortex-dashboard- All secrets stored in GitHub Secrets
- Image scanning enabled (Trivy)
- SAST enabled (CodeQL)
- Secret scanning enabled (Gitleaks, TruffleHog)
- Dependency scanning enabled (npm audit)
- Container images signed (Cosign)
- RBAC configured in ArgoCD
- Network policies enforced
- Pod security standards applied
- TLS/SSL configured for ingress
-
Caching
- pnpm store cached
- Docker BuildKit cache
- Test results cached
-
Parallelization
- Matrix builds
- Concurrent jobs
- Independent workflows
-
Artifacts
- Shared between jobs
- Retention: 7-30 days
-
Resource Limits
- Appropriate CPU/memory
- HPA configured
- PDB for availability
-
Health Checks
- Fast startup probes
- Frequent readiness probes
- Conservative liveness probes
-
Image Optimization
- Multi-stage builds
- Alpine base images
- Layer caching
- GitHub Issues: cortex/issues
- Slack:
#cortex-team - Email:
platform@example.com
- Configure GitHub Secrets (see step 1)
- Install ArgoCD (see step 2)
- Deploy Cortex (see step 3)
- Test CI/CD (see step 4)
- Configure monitoring (Prometheus/Grafana)
- Setup notifications (Slack, email)
- Configure backups (Velero)
- Document runbooks (incident response)
See CHANGELOG.md for version history and release notes.
See LICENSE for licensing information.