My work centres around building and maintaining robust monitoring and alerting systems, optimizing reliability metrics, and leading effective incident management processes to minimize downtime and improve system resilience.
Beyond incident response, I focus on continuous improvement, identifying patterns, eliminating toil, and driving initiatives that improve availability and performance at scale.



