The surge in applying Artificial Intelligence (AI) for network and service quality and efficiency optimization is undeniable. However, current AI techniques struggle to define cause-effect relationships.
This limitation poses a risk when applying these techniques to a vast array of telemetry data without a solid knowledge base. While many Internet Service Providers (ISPs) have been integrating latency measurements into their operations tools, the analysis of latency and other QoS metrics is still a developing research area. Misleading ISPs to false or missed cause-effect relationships can lead to ineffective optimization methods. Therefore, while machine learning techniques are extensively explored to manage efficient and high-quality platforms, a potentially important aspect lies in establishing robust telemetry and knowledge-based systems.
In this paper, we analyze the latency test cases for Internet Engineering Task Force Low Latency, Low Loss, and Scalable Throughput (IETF L4S) applications over a Low Latency Data Over Cable Service Interface Specifications (DOCSIS®) network path and an internet network segment. We employ a twostep approach: firstly, analyzing latency in known bottleneck or unstable links, followed by estimating unknown network segments using end-to-end measurements. We then discuss major causes within these links by using different latency models. Through L4S streaming experiments and latency test data analysis, we discuss the limitations in conventional predictive AI and explore causal reasoning's efficacy.
By testing various latency-inducing scenarios across network segments and using large-scale test data, we demonstrate the role of correct data collection and error identification.
Predictive AI excels in identifying correlations but cannot detect causal relations. Conversely, causal AI today demands a substantial knowledge base for effective analysis and is a less mature domain with limitations. This study discusses the causal inference for latency issues in ISP networks, proposing a simplification approach to identify major causal factors despite multi-segment network complexity. We believe a solid knowledge base on network access technologies such as DOCSIS, transport protocols such as L4S, and measurement methods will help early causal inference models in latency optimization systems. We then show how a reinforcement learning-based model can iteratively learn from experiment results to refine actions in resolving bottleneck issues with self-correction for systematic errors.