Paper - AI for IT Operations (AIOps) - Using AI/ML for Improving IT Operations

AI for IT Operations (AIOps) - Using AI/ML for Improving IT Operations (2022)

By Hongcheng Wang, Applied AI & Discovery, Comcast; Praveen Manoharan, Applied AI & Discovery, Comcast; Nilesh Nayan, Applied AI & Discovery, Comcast; Aravindakumar Venugopalan, Applied AI & Discovery, Comcast; Abhijeet Mulye, Applied AI & Discovery, Comcast; Tianwen Chen, Applied AI & Discovery, Comcast; Mateja Putic, Applied AI & Discovery, Comcast

The proliferation of microservices as a dominant IT architecture has created opportunities as well as challenges for operations teams responsible for maintaining software reliability. When production systems deviate from service-level objectives, operations teams must detect failures and discover their root causes to promptly resolve the issue. These teams are most often focused on minimizing mean time to resolution (MTTR) or mean time between failures (MTBF). The process of identifying, diagnosing, and resolving issues in cloud microservice architectures largely falls into two phases: anomaly detection (AD) and root cause analysis (RCA). AD is the process of identifying anomalies that correspond to system failures. RCA is the process of determining the reason why an anomaly occurred and identifying the originating service or system. Anomalies are typically detected by defining margins of normal operation on key performance indicators (KPIs) and setting alerting thresholds that generate notifications. RCA is then typically performed by inspecting the system that generated the alert and tracing the problem back to its source. Operations teams use log, trace, or metric data sources, often displayed in dashboards to diagnose and debug problems.

Download Paper

By clicking the "Download Paper" button, you are agreeing to our terms and conditions.

Similar Papers

Causality Based Instant Root Cause Analysis for Microservices Failure By Mohamed Sharafath M, Comcast India Engineering Center Praveen Manoharan, Comcast India Engineering Center Aravindakumar Venugopalan, Comcast India Engineering Center	2024
Using AI to Improve the Customer Experience: A Virtual Assistant Chatbot By Bernard Burg, Fan Liu, Abel Villca Roque, Sunil Srinivasa, Ryan March & Tianwen Chen, Comcast	2018
Using AI in Network Planning and Operations Forecasting By Petar Djukic & Maryam Amiri, Ciena Canada	2021
Accounting for Every MHz of Bandwidth: Data & Algorithms for Artifact Discovery and Close-Packing of QAMs in Support of Spectrum Activation By Maher Harb, Comcast; Wenlong Shen, Comcast; Matt Stehman, Comcast; Sanket Walavalkar, Comcast; Dan Rice, Comcast	2023
Alarm Root Cause Analysis using AI/ML in MSO Networks By Jonathan Kwan, PhD, PEng, Fujitsu Network Communications, Inc.	2024
Photon Avatars in the Comcast Cosmos: An End-to-End View of Comcast Core, Metro and Access Networks By Venk Mutalik, Steve Ruppa, Fred Bartholf, Bob Gaydos, Steve Surdam, Amarildo Vieira, Dan Rice; Comcast	2022
Building Generative AI Products: A Comprehensive Approach By Jennifer Andreoli-Fang, PhD, Amazon Web Services; Nameet Dutia, Amazon Web Services	2024
A Necessary Journey Towards an AI-driven Operation - Telecom Argentina perspective By Claudio Righetti, Mariela Fiorenzo, Horacio Arrigo; Telecom Argentina S.A.	2022
Pairing IoT and AI to Reduce Network Maintenance Costs By Goutam Agarwal, Rogers Communications; J. Clarke Stevens, Independent Consultant	2023
From Manual to Automated: AI-Driven Network Engineering and Operations By Nader Foroughi, Technetix Inc.; Chris Beem, Technetix Inc.; Diego Royo Moros, Technetix Inc.	2023
More Results >>