AI for IT Operations (AIOps) - Using AI/ML for Improving IT Operations (2022)

By Hongcheng Wang, Applied AI & Discovery, Comcast; Praveen Manoharan, Applied AI & Discovery, Comcast; Nilesh Nayan, Applied AI & Discovery, Comcast; Aravindakumar Venugopalan, Applied AI & Discovery, Comcast; Abhijeet Mulye, Applied AI & Discovery, Comcast; Tianwen Chen, Applied AI & Discovery, Comcast; Mateja Putic, Applied AI & Discovery, Comcast

The proliferation of microservices as a dominant IT architecture has created opportunities as well as challenges for operations teams responsible for maintaining software reliability. When production systems deviate from service-level objectives, operations teams must detect failures and discover their root causes to promptly resolve the issue. These teams are most often focused on minimizing mean time to resolution (MTTR) or mean time between failures (MTBF). The process of identifying, diagnosing, and resolving issues in cloud microservice architectures largely falls into two phases: anomaly detection (AD) and root cause analysis (RCA). AD is the process of identifying anomalies that correspond to system failures. RCA is the process of determining the reason why an anomaly occurred and identifying the originating service or system. Anomalies are typically detected by defining margins of normal operation on key performance indicators (KPIs) and setting alerting thresholds that generate notifications. RCA is then typically performed by inspecting the system that generated the alert and tracing the problem back to its source. Operations teams use log, trace, or metric data sources, often displayed in dashboards to diagnose and debug problems.

By clicking the "Download Paper" button, you are agreeing to our terms and conditions.

Similar Papers

Using AI to Improve the Customer Experience: A Virtual Assistant Chatbot
By Bernard Burg, Fan Liu, Abel Villca Roque, Sunil Srinivasa, Ryan March & Tianwen Chen, Comcast
2018
Using AI in Network Planning and Operations Forecasting
By Petar Djukic & Maryam Amiri, Ciena Canada
2021
Accounting for Every MHz of Bandwidth: Data & Algorithms for Artifact Discovery and Close-Packing of QAMs in Support of Spectrum Activation
By Maher Harb, Comcast; Wenlong Shen, Comcast; Matt Stehman, Comcast; Sanket Walavalkar, Comcast; Dan Rice, Comcast
2023
A Necessary Journey Towards an AI-driven Operation - Telecom Argentina perspective
By Claudio Righetti, Mariela Fiorenzo, Horacio Arrigo; Telecom Argentina S.A.
2022
Photon Avatars in the Comcast Cosmos: An End-to-End View of Comcast Core, Metro and Access Networks
By Venk Mutalik, Steve Ruppa, Fred Bartholf, Bob Gaydos, Steve Surdam, Amarildo Vieira, Dan Rice; Comcast
2022
Pairing IoT and AI to Reduce Network Maintenance Costs
By Goutam Agarwal, Rogers Communications; J. Clarke Stevens, Independent Consultant
2023
From Manual to Automated: AI-Driven Network Engineering and Operations
By Nader Foroughi, Technetix Inc.; Chris Beem, Technetix Inc.; Diego Royo Moros, Technetix Inc.
2023
Intelligent Outside Plant Power Operations with Machine Learning
By Matthew Stehman, Comcast; Chris D’Andrea, Comcast; Ilana Weinstein, Comcast
2023
Reducing the Cost of Network Traffic Monitoring with AI
By Petar Djukic, Maryam Amiri & Wade Cherrington, Ciena Canada
2021
Optimizing DOCSIS 3.0 Configuration in the Upstream through Applied Reinforcement Learning
By Kevin Dugan, Maher Harb, Dan Rice & Robert Lund, Comcast
2021
More Results >>