Modern video delivery systems, especially those invoved in IP video, are large scale distributed systems. They are composed of many collaborating subsystems, written by different teams, often in different technologies. Comcast’s IP VOD system, for instance, comprises dozens of different subsystems involved with manifest generation, licensing, encryption, caching, advertisement and other alternate content insertion, user interfaces, etc.
These systems have no central coordinator or statefull session manager and are often located in geographically disparate areas. An error can occur in a system several hops behind the player, causing a video to fail play. Correlating that error with the error observed by a user in a highly distributed environment is very challenging, as is determining how much latency is induced by various parts of the system.
This paper describes how a method for
distributed trace, based around a protocol and library named Money, an annotationbased distributed trace protocol with roots in Google’s Dapper and Twitter’s Zipkin. We describe the Money protocol and how we’re using it inside of