Detecting Video Piracy with Machine Learning (2019)

By Matthew Tooley & Thomas Belford, NCTA – The Internet & Television Assocation

The broad adoption of broadband internet and growth in average internet speed has fueled the streaming video industry. In turn, the growth and popularity of streaming video has also fueled the growth of video piracy.

Video piracy is a form of copyright infringement and refers to the use of works protected by copyright law without permission for usage where such permission is required. There are two primary forms of video piracy. The first form commonly referred to as “video-on-demand” (VOD) uses a file sharing distribution model and is commonly used by applications such as Kodi, Titanium TV, TVZion and BitTorrent based applications.

The popularity of streaming video has resulted in the creation of illegal virtual cable operators selling subscription based over-the-top IPTV, complete with electronic programming guides, that stream multiple channels of linear video. This second form is known as “pirated linear streaming”.

Pirated linear streaming is a business threat to the pay-TV industry as the pirated linear streaming product is a good substitute for legitimate pay-TV services. For the pay-TV industry, one of the issues is understanding the true scope of the problem. There are some industry reports that estimate that 5.5%of North American households are accessing pirated content. The pay-TV industry has been trying to better quantify the problem, as part of determining what actions to take to mitigate it.

To truly understand the scope and scale of video piracy, operators need to measure the volume, frequency and scope of traffic on their networks that is associated with pirated linear streams. Pirated streams use the same technologies and streaming protocols (HLS and MPEG/DASH) as legal linear streams making it difficult to distinguish the two without the use of deep packet inspection (DPI). Even with DPI, it is still difficult due to multi-tenant hosts, content delivery networks, multiple IP addresses being associated with the content sources, and the diverse demographics across the footprint of the network.

Due to a number of reasons including cost and privacy concerns, operators typically have only equipped a small portion (e.g. < 10%) of their network with DPI, if at all. In addition, collecting video piracy data using DPI from a small number of points on the network can lead to a selection bias due to the demographic makeup of the network footprint.

To effectively measure video piracy on broadband networks requires something other than DPI. An approach using available IPFIX/NetFlow data, which is embedded in most carrier-grade routers and switches, provides a cost-effective approach to measuring traffic across an entire network.

In 2016 Cisco showed that by using IP flow data fields it was possible to create a feature set for machine learning that used an L1-logistic regression model with an accuracy of 99.978% at 0.00% false discovery rate (FDR) to identify malware – encrypted and non-encrypted. In 2018, Cisco introduced an enhanced version of NetFlow, Encrypted Traffic Analytics (ETA), that included these additional IP flow data fields to a number of its products as part of a cyber security solution and open-sourced the code1 that captures, extracted, and analyzes network flow data and interflow data that includes the additional IP flow data fields.

In this paper, we look at applying a similar supervised machine learning process using IP flow data to assess the viability of using machine learning and IP flow data to detect pirated linear streaming traffic on broadband networks.

By clicking the "Download Paper" button, you are agreeing to our terms and conditions.

Similar Papers

Tele-Everything and Its Impact to The Network
By Matthew Tooley, William A. Check, Ph.D., Rob Rubinovitz & Jim Partridge, NCTA – The Internet & Television Association
Optimizing Video Customer Experience with Machine Learning
By Mariela Fiorenzo, Claudio Righetti, María Cecilia Raggio, Fernando Ochoa & Gabriel Carro, Telecom Argentina S.A.
Analyzing the Modern OTT Piracy Video Ecosystem
By Don Jones, Comcast Cable Communications Management, LLC & Kei Foo, Charter Communications
Network Capacity and Machine Learning
By Dr. Claudio Righetti, Emilia Gibellini, Florencia De Arca, Carlos Germán Carreño Romano, Mariela Fiorenzo, Gabriel Carro & Fernando Rodrigo Ochoa, Cablevisión S.A.
Machine Learning and Telemetry Improves Outside Plant Power Resiliency for More Reliable Networks
By Stephanie Ohnmacht, Matthew Stehman; Comcast
Applications of Machine Learning in Cable Access Networks
By Karthik Sundaresan, Nicolas Metts, Greg White, Albert Cabellos-Aparicio, CableLabs
Intelligent Outside Plant Power Operations with Machine Learning
By Matthew Stehman, Comcast; Chris D’Andrea, Comcast; Ilana Weinstein, Comcast
Simplifying Field Operations Using Machine Learning
By Sanjay Dorairaj, Bernard Burg & Nicholas Pinckernell, Comcast Corporation; Chris Bastian, SCTE
Machine Learning: The Past, Present and the Future
By Narayan Srinivasa, Intel Corporation
A Machine Learning Pipeline for D3.1 Profile Management
By Maher Harb, Jude Ferreira, Dan Rice, Bryan Santangelo & Rick Spanbauer, Comcast
More Results >>