At SCTE in 2012, a group from Comcast presented the Voice Relevance Engine for Xfinity (VREX). Its goal was to understand human utterances for video search and discovery and to take an appropriate action. Since then, Comcast’s voice system expanded to cover numerous platforms, won a technical Emmy, and was used for more than 10 billion voice commands last year alone. After running at scale for 10 years, the voice system is being redesigned to suit the new and growing demands for voice control across a range of products.
The original system as described at SCTE in 2012 [1] was divided into three major areas: automated speech recognition (ASR), natural language processing/natural language understanding (NLP/NLU), and action resolution (AR). These services and the division of labor therein include: first determine what the user said, second understand what the user meant, lastly decide what the system should do to resolve the request.