xGitGuard: ML-based Secret Scanner for GitHub (2021)

By Bahman Rashidi, Comcast Cable

Misused leaked secrets on code sharing platforms such as GitHub (GH) have caused some of the data breaches of our time. Unfortunately, this kind of credential leak is quite common across the code sharing platforms. Developers and code contributors are required in many cases by organization’s security policies to comply with security practices and remove sensitive information before they push their code to GitHub. However, sometimes inadvertently developers neglect to remove sensitive information, such as API tokens and user account credentials, from their code prior to posting it. Malicious attackers crawl through GitHub, hoping to find these secrets and thus grab foothold into an organization’s territory. Companies have limited ability to address this risk as given the scale of GitHub it is difficult if not impossible to find leaked secrets before malicious attackers. Some companies leverage bug bounty programs as a way to incentivize third party agents to manually look for and report these secrets through responsible disclosure. Unsurprisingly, this process can create unnecessary exposure. Consequently, we at Comcast Cybersecurity Research designed and developed “xGitGuard,”a Machine Learning (ML)-based tool that uses advanced Natural Language Processing (NLP) to detect organizational secrets and user credentials at scale and with appropriate velocity in GitHub repositories. This paper begins with a description of the problem statement. Next, we discuss the design of xGitGuard and how it improves upon current solutions, and the solution space. Finally, we provide details about how xGitGuard can be deployed in different scenarios.

By clicking the "Download Paper" button, you are agreeing to our terms and conditions.

Similar Papers

Photon Avatars in the Comcast Cosmos: An End-to-End View of Comcast Core, Metro and Access Networks
By Venk Mutalik, Steve Ruppa, Fred Bartholf, Bob Gaydos, Steve Surdam, Amarildo Vieira, Dan Rice; Comcast
2022
Two Years Of Deploying ITV/EBIF Applications – Comcast’s Lessons Learned
By Robert Dandrea, Ph.D., Comcast Cable
2010
Key Learnings from Comcast’s Use of Open Source Software in the Access Network
By Louis Donofrio & Qin Zang, Comcast Cable; Vignesh Ramamurthy, Infosys Consulting
2020
Comcast Underground: Innovative Fiber Deployments Over Existing Underground Critical Infrastructure
By Venk Mutalik, Pat Wike, Doug Combs, Alan Gardiner, Dan Rice; Comcast
2022
Navigating the Transition to a Post-Quantum World
By Chujiao Ma & Vaibhav Garg, Comcast Cable
2021
Eating the Energy Elephant One Byte at A Time: Managing Energy Consumption Across the Cable Network
By Katie Flynn, Comcast Cable; Budd Batchelder, Comcast Cable
2023
The Future of Cable Television Audio is Accessible
By Mark Francisco, Comcast Cable
2020
50 Million Keys to SNMPv3 Privacy
By Paul E. Schauer, Comcast Cable
2020
Openness And Secrecy In Security Systems: Polycipher Downloadable Conditional Access
By Tom Lookabaugh, PolyCipher and James Fahrny, Comcast Cable
2007
Rapid and Automated Production Scale Activation of Expanded Upstream Bandwidth
By Rob Thompson, Rob Howald, John Chrostowski, Dan Rice, Amarildo Vieira, Rohini Vugumudi & Zhen Lu, Comcast Cable
2021
More Results >>