Service assurance (SA), as a subset of the operational support system (OSS), plays an important role in the internet service provider (ISP) ecosystem. However, the rapidly evolving internet service provider (ISP) technologies, enterprise services offerings, and customer expectations bring great challenges to the modern service assurance system design. This paper discusses several general design principles and best practices that are essential to building a robust and resilient service assurance system with observability and awareness that could stay ahead of these fast-paced industry transformations.
First, collecting telemetry from multiple sources helps to avoid single point of failure (SPOF) and improve confidence and accuracy of alerting customers of network/service issues. Second, introducing a unified mediation layer provides flexibility to isolate vendor-specific implementations and prevent bugs from negatively impacting customer experience. Third, making use of cross product correlation and leveraging machine learning (ML) for data analytics, trending and anomaly detections to prevent service interruptions and guarantee accurate customer alerting.
This paper reflects years of SDN-based centralized service assurance system integration design, development, and customer support experience. In this paper, the authors will share ways in which these principles and techniques are applied in our enterprise service product to support business values and keep our customers happy.