The Strategic Network Planning and Automation team is the team at Cox Communications that is responsible for defining and applying strategic network growth policies, forecasting node level throughput, and creating a short to long-range plan for node actions From there, the plan is used by Cox’s Outside Planning team, which plans anything downstream from the hub and Inside Planning team, which plans in the hub, to morph them into an operational plan. The plan drives millions of dollars in node actions annually for Cox. These node actions are vital for keeping the Cox customer connected and ensuring the quality of their connected experience. The node actions are also key for ensuring the continued growth and competitive viability for the company. Having the planning as accurate as possible so the right actions are being conducted at the right times and in the right places are necessary. The team faces and has overcome many challenges and continues to work on the challenges they encounter daily. For example, the team has created highly predictive modeling using statistical and machine learning models to tune the model forecasts and provide greater accuracy over the years. Additionally, the team has created rules-based optimizations to create a more efficient planning process and limit the need for multiple touches on the node. This not only improves the customer experience, but it also saves money. However, as the network grows more capable, it also grows more complex. With that, another set of challenges the team arises, which is the challenge of poor data quality. This is a growing problem as the complexity and size of the network continuously grows. With addresses mapped to the incorrect node, or node names not matching across databases the situation leads to errors in the field, and this can often lead to issues that impact customers. As such, the team was challenged to couple its network expertise with their refined data science capabilities to define a machine learning algorithm that can not only identify mistakes in node attributes but predict the correct information. In this paper, we will dive into the technical aspects of the logic and the results of the proof of concept that one day will lead to greater accuracy in cross-system alignment, and practically eliminate errors, resulting in a more accurate planning process and avoid the costs associated with mistakes. Cox relies on the outside plant (OSP) optical node name heavily as a major key in joining different systems together. In some instances, this can be the only key that may be used to join two or more systems together. As such, when there are discrepancies in one system for what the correct node name should be, this can cause a fallout in reporting or in some cases, customers receiving incorrect speed tiers or incorrect nodes being actioned for congestion relief and other upgrades. Three of the core systems Cox uses which has a significant influence on the node name are Geographic Name Information System. (GNIS), the collective Data Over Cable Service Interface Specification (DOCSIS) systems (CMTS and Cisco SmartPhy) and the Integrated Communications Operations Management System (ICOMS). Cox uses a GNIS software system to maintain primarily outside plant network topology connectivity and assets. This system is critical to providing connectivity of customer to RF tap (and ultimately their optical node) as intended by the network design teams. The Hybrid Fiber-Coaxial (HFC) Design team is the originating stakeholders of which customers connect to which node. This name to topology association is ultimately passed to other systems such as ICOMS and inputted in the Cable Modem Termination System (CMTS). The topology information in GNIS can be associated with customers’ billing information in ICOMS via their street address. Commonly known as DOCSIS, this protocol is the international standard for IP traffic over coax. As mentioned in the context of this paper, it will collectively refer to the CMTS and SmartPhy data. The CMTS has a couple areas in the configuration where the node name may be written in the description field. Similarly, SmartPhy also allows for a “Remote Phy Device (RPD) Name” field which enables us to populate it with the Outside Plant (OSP) node name. The SmartPhy node name can be ensured it matches the node name in the service group description field by way of the chassis and Fiber Node ID field, which is a unique key for matching when combined. The customer identifiable information in the CMTS would be a list of customer premise equipment (CPE) MAC addresses reporting to each interface, which can be used to correlate to the MAC addresses in ICOMS and establish a physical location of each customer’s device. ICOMS is the third main source being leveraged. This is Cox’s main billing system which contains certain customer identifiable data such as OSP node name, CPE, MAC address, and physical address. The MAC address can be mapped to a MAC address in the CMTS to match the node name per the CMTS via chassis and interface. Unfortunately, data sources, especially large ones that are being constantly changed, are prone to discrepancies. While the interface to MAC address data in the CMTS is always correct, the device (node) name inputted in the CMTS/SmartPhy is made via manual entry. The node name in ICOMS is manual entry every time a new home becomes serviceable, or a node action causes a node name change. The GNIS OSP design data is all manual entry. Due to the ultimate human factor involved in node name entry for all core systems, this does pose some challenges to the machine learning factor (as it always has with the human factor). The current methodology for identifying discrepancies in the node name between systems is a dynamic reactive method. This is identified at two different granularities: one at the customer level and two, at the node level. The customer level tends to have the least impact traditionally of the two, with the primary impact of a single or staggered few customers within a node having the incorrect node name being subject to the possibility of services being sold that cannot be supported on their node. The node level granularity, which is when most of the customers within a node have the incorrect node name, can cause the incorrect CMTS interface to be associated to the OSP node name. This can result in node actions on a node that does not need one or higher/lower speed tiers being assigned to all customers in a node which cannot support those tiers of service. There are currently no teams nor anyone working or discovering these discrepancies as part of their official duties. Employees from various departments will discover these node name discrepancies accidently as they are impacted by them and submit them for a fix. To address this problem, a Machine Learning based node name determination solution is proposed in this paper. If two systems have a street address, a fuzzy-string matching algorithm can establish linkage. For systems that can only be linked with location coordinates a, distance matrix and GMM clustering model are built. The rest of this paper is organized as follows. In Section 2, we introduce the theoretical description of the chosen string similarity algorithm with the focus on the fuzzy string matching. The other algorithm –Haversine Distance used in this paper is also discussed. We further introduce theoretical principles behind the Gaussian Mixture Models (hereafter GMM) Clustering Algorithm, which is concluded with a summary of a related work. In Section 3 we describe the dataset used for our studies, together with the data sanitization used prior to the Extract Transform and Load (ETL) Scoring, Haversine Distance analysis and GMM clustering models, and present methodologies used in this paper along with our proposed hybrid scoring algorithm. In Section 4 we focus on presenting results and improvements accomplished by using the proposed model algorithm in mapping nodes across systems. Further research and next steps are summarized in Section 5.