AI for IT Operations (AIOps) - Using AI/ML for Improving IT Operations (2022)

By Hongcheng Wang, Applied AI & Discovery, Comcast; Praveen Manoharan, Applied AI & Discovery, Comcast; Nilesh Nayan, Applied AI & Discovery, Comcast; Aravindakumar Venugopalan, Applied AI & Discovery, Comcast; Abhijeet Mulye, Applied AI & Discovery, Comcast; Tianwen Chen, Applied AI & Discovery, Comcast; Mateja Putic, Applied AI & Discovery, Comcast

The proliferation of microservices as a dominant IT architecture has created opportunities as well as challenges for operations teams responsible for maintaining software reliability. When production systems deviate from service-level objectives, operations teams must detect failures and discover their root causes to promptly resolve the issue. These teams are most often focused on minimizing mean time to resolution (MTTR) or mean time between failures (MTBF). The process of identifying, diagnosing, and resolving issues in cloud microservice architectures largely falls into two phases: anomaly detection (AD) and root cause analysis (RCA). AD is the process of identifying anomalies that correspond to system failures. RCA is the process of determining the reason why an anomaly occurred and identifying the originating service or system. Anomalies are typically detected by defining margins of normal operation on key performance indicators (KPIs) and setting alerting thresholds that generate notifications. RCA is then typically performed by inspecting the system that generated the alert and tracing the problem back to its source. Operations teams use log, trace, or metric data sources, often displayed in dashboards to diagnose and debug problems.

By clicking the "Download Paper" button, you are agreeing to our terms and conditions.

Similar Papers

Using AI to Improve the Customer Experience: A Virtual Assistant Chatbot
By Bernard Burg, Fan Liu, Abel Villca Roque, Sunil Srinivasa, Ryan March & Tianwen Chen, Comcast
Using AI in Network Planning and Operations Forecasting
By Petar Djukic & Maryam Amiri, Ciena Canada
A Necessary Journey Towards an AI-driven Operation - Telecom Argentina perspective
By Claudio Righetti, Mariela Fiorenzo, Horacio Arrigo; Telecom Argentina S.A.
Photon Avatars in the Comcast Cosmos: An End-to-End View of Comcast Core, Metro and Access Networks
By Venk Mutalik, Steve Ruppa, Fred Bartholf, Bob Gaydos, Steve Surdam, Amarildo Vieira, Dan Rice; Comcast
Reducing the Cost of Network Traffic Monitoring with AI
By Petar Djukic, Maryam Amiri & Wade Cherrington, Ciena Canada
Optimizing DOCSIS 3.0 Configuration in the Upstream through Applied Reinforcement Learning
By Kevin Dugan, Maher Harb, Dan Rice & Robert Lund, Comcast
Two Years Of Deploying ITV/EBIF Applications – Comcast’s Lessons Learned
By Robert Dandrea, Ph.D., Comcast Cable
IG Discovery for FDX DOCSIS
By Tong Liu, Cisco Systems Inc.
Lessons Learned: Embedding AI in Cable Customer Experience to Better Serve Agents and Customers
By Rachel Knaster, ASAPP
Key Learnings from Comcast’s Use of Open Source Software in the Access Network
By Louis Donofrio & Qin Zang, Comcast Cable; Vignesh Ramamurthy, Infosys Consulting
More Results >>