Lexicon Calibration: Optimising Performance & Reducing Keyword Fatigue

You Seem Interested

Book a demo with our expert team today!

    Contents

Overview

This is Part 2 of our three-part series on mastering lexicon-based communications surveillance. Building on the fundamentals covered in Part 1, this blog focuses on how to calibrate your existing lexicon to reduce keyword fatigue and improve performance. In Part 3, we will explore how artificial intelligence can be applied alongside lexicon-based surveillance frameworks to enhance risk detection, reduce false positives, and improve efficiencies.


The Strategic Approach to Lexicon Calibration

So you've built your lexicon following the guidance in Part 1, but how do you now make it more efficient, accurate and ensure it captures emerging risks?

This guide examines how firms should approach lexicon calibration to improve surveillance effectiveness, from analysing your data to making impactful changes and documenting results.

 

What is Lexicon Calibration?

Lexicon calibration is the iterative process of reviewing, refining, and adjusting your keyword lists to ensure they accurately reflect real-world behaviour, address firm-specific risks, meet current regulatory expectations, and maintain an optimal true/false positive ratio for operational efficiency.

SteelEye - What Is Lexicon Calibration


STEP 1: START WITH THE DATA AND METRICS

Before making any changes to your lexicon, you need to understand how it's currently performing. Begin by analysing key performance indicators from your alert data:

  • Alert volumes: How many alerts are being raised every day? Are your analysts meeting their SLAs for investigations and closures?

  • False positives: What proportion of alerts raised have been closed as false positives? Which of these are valid hits for the future, but were not actual transgressions?

  • Alerts by channel: Which communication channels are producing the most alerts, and which are being resolved as false positives?

  • Missed risks: What risks weren’t raised by your policies but instead identified via tip-offs or flagged as part of other investigations?

  • Time spent on each alert: How long does it take to investigate a L1 alert vs a L2 alert, and how many alerts reach each status?

  • Irrelevant terms: Which keyword terms are never producing any alerts, and why?

  • Trends in non-alert specific data: Is there a general increase in communications volumes, or peaks in alerts caused by geopolitical or environmental events?

 

While false positives are inevitable, ongoing calibration is about managing the ratio so as not to overwhelm your compliance team with too much noise.


Step 2: Reduce False Positives Through Strategic Refinement

SteelEye - Lexicon Calibration Optimising Performance & Reducing Keyword Fatigue - Step 2

Once you understand your performance metrics, focus on systematically reducing false positives by:

  • Refining overly broad terms: Apply techniques like contextual inclusions/exclusions, discussed in Part 1, to make generic keywords more specific to the behaviour you’re targeting, thereby reducing the number of alerts.

  • Eliminating consistently irrelevant terms: Remove keywords that repeatedly generate false positives without contributing to meaningful risk detection.

  • Optimising by communication channel: Different channels exhibit distinct language patterns that require tailored approaches. Email communications tend to be formal and structured, chat platforms feature shorthand and abbreviated language, whilst voice communications are characterised by fluid speech patterns and verbal fillers. Implementing channel-specific lexicon rules can help tailor alerting depending on the type of language used.

  • Consolidating overlapping terms across lexicons: Identify redundant coverage where multiple lexicon packs trigger on similar language patterns, causing duplicative alerts.

  • Addressing multilingual challenges: Poor translations between languages often generate false positives. Consider developing language-specific lexicon packs, created by native speakers, to improve accuracy and cultural relevance.

  • Implementing Policy Refinements:

    Beyond keyword adjustments, examine broader policy and technical configurations:

    • Refine monitoring populations: Ensure surveillance scope aligns with actual risk exposure by reviewing which employee groups require monitoring for specific risk types

    • Leverage metadata for precision targeting: Implement conditional logic to ensure lexica behaviours and subsets of terms apply to the monitored population.

    • Deploy machine learning enhancements: Apply intelligent filtering techniques to improve contextual understanding and reduce false positive rates, as outlined in Part 1.

 

Use backtesting to ensure your refinements don't miss genuine risks

A lower number of alerts may make your investigations more manageable, but if you miss genuine risks, the entire purpose of surveillance is undermined. Backtesting is therefore essential to test proposed changes against historical data, where you know the outcomes, to verify that important risks would still be captured.

 

Balance Precision with Comprehensive Risk Coverage

Some keywords will always generate a mix of relevant and irrelevant alerts, but may be too important to remove entirely. Consider the phrase "I heard a rumour" in insider trading surveillance. While this phrase might appear in innocent contexts like office gossip about personnel changes, it remains critical for detecting potential information leakage. These are keywords you want to retain in your lexicon despite generating some false positives. AI-enhanced surveillance can help manage this balance by providing additional context analysis alongside traditional keyword matching, which we'll explore in Part 3 of this series.

 

Understand Your Organisation's Risk Appetite

Every organisation has a different risk appetite and therefore an acceptable false positive ratio. Some firms are willing to tolerate higher false positive rates if it means not risking missing any compliance risks, despite the increased costs in reviewing alerts. Others prefer more precise alerting, acknowledging that a few missed risks may slip through their surveillance net.

This decision fundamentally impacts your operational model, as false positive volumes directly influence your Level 1 review team size, location, and associated costs. Understanding your firm's position on this trade-off is essential for effective lexicon calibration.


Step 3: Increase True Positives by Identifying Missed Risks

SteelEye - Lexicon Calibration Optimising Performance & Reducing Keyword Fatigue - Increase True Positives by Identifying Missed Risks

In any surveillance programme, some risks will inevitably fall through the cracks. We simply don't know what we don't know. This is something regulators are aware of and expect, but it is a financial institution’s duty to perform “best efforts” to detect risks appropriate to your firm’s operating model.

How to Identify Missed Risks

Transgressions that have not been picked up by a lexicon can be uncovered through investigations, random sampling, whistleblower reports, regulatory examinations, or pure coincidence. Treat these as learning opportunities to improve the performance of your lexicons.

Using False Negatives to Tune Your Lexicon

When a transgression that wasn't flagged by your lexicon comes to light, conduct a thorough post-incident analysis. Resist the temptation to simply add the missed keywords to your lexicon. Instead, think behaviour, not just keywords.

Consider a scenario where you discover that someone used seemingly innocent words like "umbrella" as code for sensitive information. Your first instinct might be to add "umbrella" to your lexicon, but this approach creates two problems. First, you'll generate countless false positives from legitimate business discussions about weather, insurance coverage, or corporate structures. Second, and more importantly, sophisticated bad actors will most likely have moved on to new code words, so you are not actually capturing the nefarious behaviour you are trying to monitor for.

Instead, focus on the behavioural patterns that accompany intentional evasion:

  • Behavioural context surrounding coded language: Look for indicators like quid pro quo arrangements or secrecy. You can add proximity settings in your lexicon configuration for phrases like “keep this between us”, so phrases like “keep this umbrella/boat/pineapple between us” would flag regardless of whether codewords are used to throw off detection.

  • Timing anomalies: Examine timestamps of communications. Were messages sent outside normal business hours or around significant market events?

  • Channel switching: Look for sudden shifts from monitored communications channels, such as email, to personal messaging apps or face-to-face meetings.

  • Linguistic shifts and context disconnects: Implement searches that identify communication patterns that feel deliberately vague, overly formal, or uncharacteristically casual, or that seem oddly detached from legitimate business purposes. AI models excel at detecting "unusual" behaviour by understanding what constitutes "normal" communication patterns, without requiring the hard-coded search parameters that traditional lexicons demand, which we will look at in Part 3.

  • Emoji usage: These can have hidden meanings 😉. Consider whether your lexicon currently monitors for emoji use that might indicate coded communication.

  • Relationship patterns: Identify unusual communication between parties who don't normally interact, particularly if one team has access to privileged information.

  • Platform-specific behaviours: Different communication channels may require different approaches. For voice transcriptions, you may need looser lexicon terms with proximity settings to account for mis-transcribed words.

 

Understanding Surveillance Limitations

Remember that you can only surveil monitored channels, so there will always be allowance for missed risks through face-to-face communications or personal devices. This is where elements beyond surveillance systems, such as training, ethics, culture, and whistleblowing programmes, become paramount in your overall compliance efforts.


Subscribe-to-SteelEyes-Newsletter-2


Step 4: Test Your Changes Through Backtesting and Synthetic Data

Before implementing any lexicon changes, thorough testing is essential to ensure your refinements don't miss genuine risks or create unintended consequences. This should involve backtesting against historic “labelled” data, as well as synthetic data if possible.

Backtesting

Test proposed changes against historical data where you know the outcomes to verify that important risks would still be captured. This involves:

  • Selecting representative datasets: Use historical communications from periods that include both genuine violations and normal business activity. Ensure coverage is across all relevant teams, departments, geographies and languages.

  • Measuring impact on precision and recall: Calculate how the changes will affect both true and false positive rates, without reducing your ability to catch genuine risks

  • Testing edge cases: Include examples of sophisticated evasion attempts or unusual risk scenarios to check that your new rules would flag these hidden risks

  • Running parallel analyses: Some firms like to run both pre- and post-calibrated lexicons in Production, and compare results.

 

Key Metrics to Monitor During Backtesting:

  • True positive retention: Confirm that previously identified violations would still be flagged

  • False positive reduction: Measure the decrease in irrelevant alerts

  • Coverage gaps: Identify any new blind spots created by your changes

  • Alert volume changes: Understand how modifications affect the overall workload

 

Backtesting provides the evidence base you need to justify changes to stakeholders and regulators, demonstrating that modifications are data-driven rather than arbitrary.

 

Validating Against Synthetic Data

Generative AI tools are increasingly being used to create realistic test scenarios for lexicon calibration. These platforms can generate synthetic, non-sensitive data that allows you to validate how your lexicon responds to potential future risks.

Creating Comprehensive Test Scenarios:

Use tools like generative AI to create synthetic examples of:

  • Potential new forms of market manipulation: Scenarios that might emerge as markets evolve

  • Emerging communication patterns: New slang, acronyms, or coded language that might develop

  • Novel evasion techniques: Sophisticated attempts to circumvent surveillance that bad actors might employ

  • Industry-specific risk scenarios: Tailored to your firm's particular business activities and client base

  • Near Misses: Examples of language that you would want to raise alerts, but may not be actual examples of risk, to catch ambiguous communications

  • Normal business scenarios: Communications you don't want your lexicon to flag alerts for, ensuring you're not creating unnecessary false positives. Realistically, 90%+ of your communications would simulate this type of message data

SteelEye - Lexicon Calibration Optimising Performance & Reducing Keyword Fatigue - Creating Comprehensive Test Scenarios

Implementation Best Practices:

  • Generate diverse scenarios: Create variations in communication style, formality, and context

  • Test across channels: Ensure synthetic data covers all communication platforms your firm monitors

  • Include regulatory scenarios: Create examples based on recent enforcement actions or regulatory guidance

  • Validate realism: Have compliance team members and SMEs review synthetic scenarios to ensure they're believable and relevant

 

This is particularly valuable because historical data inherently doesn't represent new risks and emerging communication patterns.


Step 5: Document Changes Systematically

For regulatory confidence, scenario changes should be backed by thorough documentation and evidence-based decision-making. Every modification should be supported by clear data and identified risk gaps, with the rationale documented for future reference.

Regulatory Requirements

Regulators expect firms to systematically identify surveillance gaps and proactively enhance risk coverage. They favour dynamic systems that evolve with emerging risks over static approaches.

Regulators commonly focus on surveillance system evolution, with examiners requiring clear justification for changes and evidence that the modifications made have improved effectiveness. Comprehensive documentation is essential to ensure that risk coverage remains current and robust.

Documentation Requirements

For any change, you need to:

  • Evidence why the change is required - demonstrable issues or risk gaps

  • Document:

    • The rationale for each modification with a clear business justification

    • The data that was used to analyse the changes' effectiveness

    • Any known risks or outstanding gaps in the updated calibration

Effective documentation should capture not just what changed, but why the change was necessary, what alternatives were considered, and how the impact was measured, to demonstrate thoughtful governance.

Ultimately, this will help your firm's defensibility, providing regulators with the confidence that you have truly made best efforts to improve your surveillance system with the resources and technology available.


Step 6: Impact Assessment

After your lexicon changes have been rolled out and alert levels have changed, evaluate whether your calibration efforts were successful by asking:

  • Has the number of alerts changed appropriately?

  • Are the alerts more relevant to actual risk scenarios?

  • Are analysts spending less time on irrelevant hits?

  • Has the false positive rate improved?

SteelEye - Lexicon Calibration Optimising Performance & Reducing Keyword Fatigue - Case Study

Conclusion

By implementing systematic calibration processes, measuring performance rigorously, and maintaining comprehensive documentation, firms can achieve dramatic improvements in surveillance effectiveness while reducing operational burden.

The techniques covered in this guide represent current best practices for lexicon calibration. However, the surveillance landscape continues to evolve, with artificial intelligence offering new opportunities to enhance and complement traditional keyword-based approaches.

In Part 3 of this series, "The Future of Lexicon Surveillance: Integrating AI and Advanced Technologies," we'll explore how AI can be implemented to complement the lexicon approach, addressing the inherent limitations of lexicon-based surveillance and creating more intelligent, adaptive systems.


Experience Smarter Communications Surveillance

Ready to move beyond basic keyword matching? SteelEye's AI-enhanced Surveillance Lexicon delivers the intelligent, adaptive monitoring capabilities outlined in this guide, reducing keyword fatigue while catching real risks.

See How It Works →

 

Book a Demo

Nothing compares to seeing it for yourself. Schedule a demo now to discover how SteelEye transforms compliance. Provide your details below and we'll be in touch.

Newsletter Signup

Stay ahead of compliance updates, market trends, and exclusive SteelEye news.

background-lines-animation

Latest News

Lexicon Calibration: Optimising Performance & Reducing Keyword Fatigue

| 20 Aug 2025

The Price Keepers: The World of Commodity Benchmarks and Price Reporting Authorities (Part 1)

| 14 Aug 2025

Lexicon Fundamentals: Building a Communications Surveillance Lexicon

| 07 Aug 2025

MediaAlpha Fine - $45m - Unfair and Deceptive Practices - FTC - Aug-25

| 06 Aug 2025

Assurance IQ Fine - $100m - Fraud - FTC - Aug-25

| 06 Aug 2025

MUFG Fine - $9.8m - Substituted Compliance - SEC - Aug-25

| 06 Aug 2025