xGitGuard: ML-based Secret Scanner for GitHub (2021)

By Bahman Rashidi, Comcast Cable

Misused leaked secrets on code sharing platforms such as GitHub (GH) have caused some of the data breaches of our time. Unfortunately, this kind of credential leak is quite common across the code sharing platforms. Developers and code contributors are required in many cases by organization’s security policies to comply with security practices and remove sensitive information before they push their code to GitHub. However, sometimes inadvertently developers neglect to remove sensitive information, such as API tokens and user account credentials, from their code prior to posting it. Malicious attackers crawl through GitHub, hoping to find these secrets and thus grab foothold into an organization’s territory. Companies have limited ability to address this risk as given the scale of GitHub it is difficult if not impossible to find leaked secrets before malicious attackers. Some companies leverage bug bounty programs as a way to incentivize third party agents to manually look for and report these secrets through responsible disclosure. Unsurprisingly, this process can create unnecessary exposure. Consequently, we at Comcast Cybersecurity Research designed and developed “xGitGuard,”a Machine Learning (ML)-based tool that uses advanced Natural Language Processing (NLP) to detect organizational secrets and user credentials at scale and with appropriate velocity in GitHub repositories. This paper begins with a description of the problem statement. Next, we discuss the design of xGitGuard and how it improves upon current solutions, and the solution space. Finally, we provide details about how xGitGuard can be deployed in different scenarios.

By clicking the "Download Paper" button, you are agreeing to our terms and conditions.

Similar Papers

Two Years Of Deploying ITV/EBIF Applications – Comcast’s Lessons Learned
By Robert Dandrea, Ph.D., Comcast Cable
2010
Key Learnings from Comcast’s Use of Open Source Software in the Access Network
By Louis Donofrio & Qin Zang, Comcast Cable; Vignesh Ramamurthy, Infosys Consulting
2020
Navigating the Transition to a Post-Quantum World
By Chujiao Ma & Vaibhav Garg, Comcast Cable
2021
The Future of Cable Television Audio is Accessible
By Mark Francisco, Comcast Cable
2020
50 Million Keys to SNMPv3 Privacy
By Paul E. Schauer, Comcast Cable
2020
Openness And Secrecy In Security Systems: Polycipher Downloadable Conditional Access
By Tom Lookabaugh, PolyCipher and James Fahrny, Comcast Cable
2007
Rapid and Automated Production Scale Activation of Expanded Upstream Bandwidth
By Rob Thompson, Rob Howald, John Chrostowski, Dan Rice, Amarildo Vieira, Rohini Vugumudi & Zhen Lu, Comcast Cable
2021
Software Reliability Engineering: Scaling the Cloud with Automation
By Brian Gray, Sriram Ramakrishnan & Fei Wan, Sr., Comcast Cable
2021
Considerations When Delivering Cable TV To IP Connected Consumer Electronics
By Mark Francisco, Comcast Cable Corporation
2011
Implementation Of Stereoscopic 3D Systems On Cable
By David K. Broberg, Cable Television Laboratories, Inc. and Mark Francisco, Comcast Cable Communications, Inc.
2010
More Results >>