xGitGuard: ML-based Secret Scanner for GitHub (2021)

By Bahman Rashidi, Comcast Cable

Misused leaked secrets on code sharing platforms such as GitHub (GH) have caused some of the data breaches of our time. Unfortunately, this kind of credential leak is quite common across the code sharing platforms. Developers and code contributors are required in many cases by organization’s security policies to comply with security practices and remove sensitive information before they push their code to GitHub. However, sometimes inadvertently developers neglect to remove sensitive information, such as API tokens and user account credentials, from their code prior to posting it. Malicious attackers crawl through GitHub, hoping to find these secrets and thus grab foothold into an organization’s territory. Companies have limited ability to address this risk as given the scale of GitHub it is difficult if not impossible to find leaked secrets before malicious attackers. Some companies leverage bug bounty programs as a way to incentivize third party agents to manually look for and report these secrets through responsible disclosure. Unsurprisingly, this process can create unnecessary exposure. Consequently, we at Comcast Cybersecurity Research designed and developed “xGitGuard,”a Machine Learning (ML)-based tool that uses advanced Natural Language Processing (NLP) to detect organizational secrets and user credentials at scale and with appropriate velocity in GitHub repositories. This paper begins with a description of the problem statement. Next, we discuss the design of xGitGuard and how it improves upon current solutions, and the solution space. Finally, we provide details about how xGitGuard can be deployed in different scenarios.

By clicking the "Download Paper" button, you are agreeing to our terms and conditions.

Similar Papers

Photon Avatars in the Comcast Cosmos: An End-to-End View of Comcast Core, Metro and Access Networks
By Venk Mutalik, Steve Ruppa, Fred Bartholf, Bob Gaydos, Steve Surdam, Amarildo Vieira, Dan Rice; Comcast
Two Years Of Deploying ITV/EBIF Applications – Comcast’s Lessons Learned
By Robert Dandrea, Ph.D., Comcast Cable
Key Learnings from Comcast’s Use of Open Source Software in the Access Network
By Louis Donofrio & Qin Zang, Comcast Cable; Vignesh Ramamurthy, Infosys Consulting
Comcast Underground: Innovative Fiber Deployments Over Existing Underground Critical Infrastructure
By Venk Mutalik, Pat Wike, Doug Combs, Alan Gardiner, Dan Rice; Comcast
Navigating the Transition to a Post-Quantum World
By Chujiao Ma & Vaibhav Garg, Comcast Cable
The Future of Cable Television Audio is Accessible
By Mark Francisco, Comcast Cable
50 Million Keys to SNMPv3 Privacy
By Paul E. Schauer, Comcast Cable
Openness And Secrecy In Security Systems: Polycipher Downloadable Conditional Access
By Tom Lookabaugh, PolyCipher and James Fahrny, Comcast Cable
Rapid and Automated Production Scale Activation of Expanded Upstream Bandwidth
By Rob Thompson, Rob Howald, John Chrostowski, Dan Rice, Amarildo Vieira, Rohini Vugumudi & Zhen Lu, Comcast Cable
Considerations When Delivering Cable TV To IP Connected Consumer Electronics
By Mark Francisco, Comcast Cable Corporation
More Results >>