Software Reliability Engineering: Scaling the Cloud with Automation (2021)

By Brian Gray, Sriram Ramakrishnan & Fei Wan, Sr., Comcast Cable

Operations teams have long functioned under a primary mandate of assuring customers the most solid and uninterrupted experience possible. The recent advent of software reliability engineering (SRE) introduced engineers to the formalized automation of toil, leaving more time for creative problem solving of service interruptions. However, twin issues remain: how to automate a virtualized cloud environment in practice, and how to measure and prioritize repeated tasks to be automated for the greatest impact. In this paper, we offer a case study of the evolution of a complex cloud infrastructure from a state of manual deployment, scaling, failover, and upgrading, to one of push-button control and automated self-management. Driving this evolution is a pair of mathematical tools developed in Comcast’s Core Application Platforms (CAP) group that use the financial concepts of “net present value”/NPV and “internal rate of return”/IRR to organize and value automation opportunities simply and objectively.

By clicking the "Download Paper" button, you are agreeing to our terms and conditions.

Similar Papers

Key Learnings from Comcast’s Use of Open Source Software in the Access Network
By Louis Donofrio & Qin Zang, Comcast Cable; Vignesh Ramamurthy, Infosys Consulting
Unleashing Managed SD-WAN with Closed-Loop Automation
By Tom DiMicelli, Ciena Corporation
Running a Multi-Tenant Hybrid Cloud for Large Scale Cable Applications
By Neill A. Kipp, Comcast
Virtualized Software Transcoding for Cloud TV Services
By Yasser F. Syed Ph.D., Comcast Distinguished Engineer and Weidong Mao Ph.D., Comcast Senior Fellow & IEEE Fellow
The Evolution of Cable Network Security
By Matt Tooley, NCTA, Matt Carothers, Cox Communications, Michael Glenn, CableLabs, Michael O’Reirdan, Comcast, Chris Roosenraad, Time-Warner Cable, and Bill Sweeney, Comcast
Software Defined Networking And Cloud –Enabling Greater Flexibility For Cable Operators
By David Lively, Cisco
Delivering Cloud-Native Operations with Edge Compute Enabled DAA: Implementing a Kubernetes Distributed Edge
By Marco Naveda, Dmitri Fedorov & Raghu Ranganathan, Ciena
How to Succeed With SD-WAN Using Virtualized Service Assurance
By Etienne Martel & Gregory Spear, Accedian
How An MSO Can Leverage SD-WAN To Grow Its Enterprise Revenue
By Narayan Raman, Yadhav Krishnan, Miguel Hernandez & Furquan Ansari, Bell Labs Computing/Nokia
Up Your Uptime With Automation
By Nancy McGuire & Kathy Fox, Comcast Cable
More Results >>