The proliferation of microservices as a dominant IT architecture has created opportunities as well as challenges for operations teams responsible for maintaining software reliability. When production systems deviate from service-level objectives, operations teams must detect failures and discover their root causes to promptly resolve the issue. These teams are most often focused on minimizing mean time to resolution (MTTR) or mean time between failures (MTBF). The process of identifying, diagnosing, and resolving issues in cloud microservice architectures largely falls into two phases: anomaly detection (AD) and root cause analysis (RCA). AD is the process of identifying anomalies that correspond to system failures. RCA is the process of determining the reason why an anomaly occurred and identifying the originating service or system. Anomalies are typically detected by defining margins of normal operation on key performance indicators (KPIs) and setting alerting thresholds that generate notifications. RCA is then typically performed by inspecting the system that generated the alert and tracing the problem back to its source. Operations teams use log, trace, or metric data sources, often displayed in dashboards to diagnose and debug problems.