Watchlist Matching Techniques

Compare exact, fuzzy, phonetic, rule-based, and LSTM-powered matching techniques used in Socure watchlist screening and entity resolution.

Matching techniques are a critical component of Socure's Watchlist solution. They involve the algorithms and methods used to compare the names or entities in the watchlist against a real-time database of names or entities. Several matching techniques can be used to identify potential matches, each with its strengths and weaknesses.

Exact matching: This technique involves comparing the name or entity in the watchlist against the names or entities in the database, character by character, to find exact matches. This technique is the most accurate, but it may miss potential matches due to minor differences in spelling or formatting.
Fuzzy matching: This is a more flexible matching technique that considers differences in spelling, formatting and other variations in names or entities. It assigns a score to each potential match based on the level of similarity between the name or entity in the watchlist and the names or entities in the database. This technique is more inclusive than exact matching but may also generate false positives.
Phonetic matching: This technique compares the sound of names or entities rather than the spelling or formatting. This technique is useful when the names or entities in the watchlist may have needed to be corrected or transcribed correctly. It is less accurate than exact matching but can still identify potential matches that other techniques would miss.
Rule-based matching: This technique involves setting up rules that specify the conditions for a potential match. This technique is highly customizable and can be tailored to specific needs, but it may also miss possible matches that do not meet the specified rules. For example, the rules may specify that a match must occur if the name in the watchlist matches the name in the database and the birthdate is within a certain range.
Machine learning-based matching: This technique involves using machine learning algorithms, such as neural networks and Long Short-Term Memory (LSTM), to identify patterns and similarities between names or entities in the watchlist and names or entities in the database. This technique can be very accurate, especially when the extensive database and the names or entities in the watchlist are complex. It requires a large amount of training data to work effectively.

Socure's unique approach to watchlist matching and entity resolution

At Socure, we leverage artificial intelligence and machine learning to develop effective watchlist matching and entity resolution. Our approach uses state-of-the-art neural networks, specifically Long Short-Term Memory (LSTM) networks, to learn how to detect subtle patterns in names and identify relationships between variants.

LSTMs are neural networks well-suited to handling complex sequences and variants. Our models have learned to cluster names that likely correspond to the same individual, even across languages and writing systems. This means that when a name like Алехандро Фернандез vs. Alejandro Fernandez vs. AL3HANDRO FERN4NDETH is entered, our LSTMs can match it to the appropriate cluster and determine the likelihood it refers to the same person on the watchlist.

One of the key advantages of our approach is that it does not rely on rigid and limited rules-based systems, which are often unable to capture the complex variations and patterns in names and entities. Instead, our AI has learned these skills by analyzing billions of names and aliases and can recognize and cluster them accordingly.

Machine learning-based matching techniques, such as our LSTM models, are highly effective at identifying potential matches, even when the names or entities in the watchlist are highly complex and may have multiple variations. They require a large amount of training data to work effectively, but once they have been trained, they can detect subtle patterns and relationships between variants that would be difficult or impossible for humans to identify.

Our Watchlist solution is designed to provide super-human levels of accuracy and scale, and we are committed to ensuring that our models are transparent and ethical. For this reason, we encourage you to ask for Watchlist Model Governance, which provides a comprehensive overview of our approach, methodology, and performance metrics.

FAQs

What is match score?

The match score presents an innovative risk scoring feature, assessing the probability that each watchlist name match aligns with your input identity. This is calculated using an exclusive deep learning model that analyzes various name elements and aliases. Using match scores greatly enhances precision in evaluating closely-matching names and gives priority to reviewing high-risk hits.

What do the match score values indicate?

A match score value signifies the degree of similarity in the name match, rated on a scale of 1 to 100:

• 90-100: Extremely strong match. Slight variations like a missing middle initial.

• 75-90: Solid match. There might be some variations in name spelling or formatting.

• 50-75: Moderate match. Noticeable differences in the name.

• 30-50: Weak match. Significant discrepancies in the name.

• 1-30: Very weak match. Names are somewhat connected, but with numerous differences.

How do I utilize the match score?

The match score enables you to accomplish the following:

• Prioritize reviews: Begin with the most critical hits by considering the match score.

• Triage likely false positives: Rapidly identify weaker matches for filtering.

• Refine name matching: Modify name flexibility according to match score insights.

• Boost risk analysis: Integrate MatchScore into your scoring model for improved risk assessment.

Below is an exmple of of sorting watchlist hits by match scores:

hits = response['matches']['watchlistName']

sorted_hits = sorted(hits, key=lambda x: x['matchScore'], reverse=True)

Are there any restrictions or limitations regarding match scores that I should be aware of?

Certain name variations might not result in a match due to algorithm limitations. If this happens, it's possible that the training dataset used wasn't fully representative of the specific scenario. Kindly communicate this to Socure Support, providing details about the name comparisons. The product development team will use this information to create an improved training set for a match score update.

What are best practices for reviewing and resolving a Watchlist match?

See Resolve a Watchlist Match.

Updated about 1 month ago