Site Reliability Engineer | Observability Specialist
I specialize in building and scaling high-performance observability platforms. Currently, I manage telemetry pipelines for mission-critical financial systems, focusing on the Grafana ecosystem and OpenTelemetry.
- Observability: Grafana Stack (Mimir, Loki, Tempo, Alloy), Prometheus, OpenTelemetry (OTel), Datadog, New Relic.
- Infrastructure: Kubernetes (EKS), AWS, Terraform, ArgoCD, Helm.
- Development: Go (Golang), Shell Scripting, SQL.
I'm currently maintaining a Go-based API focused on exploring Full-Signal Telemetry.
- Full Signals: Native instrumentation for Metrics, Logs, Traces, and Continuous Profiling.
- GitOps Workflow: Automated deployments via ArgoCD and Helm.
- Goal: Testing the boundaries of telemetry ingestion and agent orchestration in the Grafana ecosystem.
- Managed 14 TB+ of daily logs across 40 Kubernetes clusters.
- Reduced critical alert noise by 90% through telemetry governance.
- Implemented OTel as the primary metrics motor for Grafana Mimir in production.
- Email: gabriel.rocha@ufrj.br
- LinkedIn: linkedin.com/in/gabrielrocha14
