Why the Surveillance Lexicon needs to be reinvented

Out with the old and in with the new

Why the Communications Surveillance Lexicon Needs to be Reinvented

The use of a surveillance lexicon is viewed as a necessary but frustrating part of any eComms / vComms surveillance suite. Surveillance systems come pre-loaded with lexica-packs which are hard to adjust and insensitive - therefore generating high numbers of false positives. Fundamentally, lexicon-based searches are narrow.

Narrow lexicon searches only ever capture a subsection of an actual thought. In other words, it only looks at the use of a prescribed word without context. While this can often be enough to trigger an investigation, it casts a very wide net – often mistakenly identifying harmless behaviour for the prohibited. The key is to identify the expressed meaning of the language used.

Casting Wider nets

As much as traditional lexica-based searches are broad brushes, they generally aren’t forgiving enough when it comes to:

British vs. American English (Americanisations / Americanizations)
Linguistic contractions (“I will” vs. “I’ll”)
Ordinal variations (“1st” vs. “First”)
Common slang (“Market Level” vs. “Mkt lvl”)
Capital Markets industry slang
Linguistic switch

A surveillance system needs to be able to account for all of these permutations. Take for example someone writing: “this is the first time I've authorized the mkt move to Tele”. Most systems today can't recognise these different spellings and therefore would not trigger an alert.

They also need to be able to capture new words, channels and expressions. Take the example of someone wanting another individual to move to a different communication channel (e.g., move the email thread to a WhatsApp chat). ‘WhatsApp’ in this context may be part of a class of prohibited communication channels and compliance teams need to be able to flag its use to investigate this intent.

Refining the net

The effectiveness of traditional lexica-based searches is limited because they do not consider context, which is necessary to understand the intent in which a word has been applied. Baselined context accounts for the overall nature of the dialogue.

Compliance teams should be able to tell if an email is a newsletter or marketing email, for example from SteelEye with the subject “I HAVE A GREAT INSIDER TRADING TIP FOR YOU!” designed to fire an alert in your surveillance system to demonstrate that it needs finetuning.

Friendly chat between two colleagues - Surveillance Lexicon A large amount of context can be ascertained from the participants contained in a communication and the style of the overall communication itself. For example, you should be able to identify a friendly Bloomberg chat between two known individuals and if there are any changes in how they are communicating.

Also, if a conversation refers to a person, you should be able to have different outcomes based on who that person is and how they are referenced. For example, if they’re talking about a political figure being ‘an idiot’, should compliance and surveillance teams care about that? Certainly less so than if someone called Matt from the Product team a ‘dolt’.

Beneath the abstractions, it is essential to be able to differentiate between similar terms. For example, ‘raising the bid’ should not necessarily trigger an alert for someone typing ‘raising funds for a re-election bid’. Neither should the use of ‘parlour’ or ‘parlor’ when referring to a ‘pizza parlour’.

AI vs. Language

Where does AI fit into surveillance? AI is a useful boost for any linguistic-based surveillance system. AI can fairly accurately:

show related communications;
show similar communications (in terms of linguistic style);
differentiate between a sentence/word that is located in the body of an email and in the email signature.

Surveillance Lexicon for Range of Channels-min-1

A mature surveillance solution is able to do all three. Take for example a mailing-list email from a fashion label that includes ‘join us on telegram’ at the bottom of their email signature. A surveillance system should: a) know that this is from a mailing list, and b) be able to determine that the offending term is in the signature, and therefore not flag an alert.

Additionally, the more you engage with a surveillance system, the better it’s predictions should become.

A new approach

Culture and history is built on language – the medium through which meaning is conveyed. A well-tuned surveillance system that leverages the core way in which we communicate is the only way to stay ahead.

In order to do this, surveillance systems have to understand how language is used, and how it is used in the context of what you’re actually searching for.

By applying modern technology, the workings of communications surveillance lexicons can be improved significantly, balancing the needs of firms to monitor a much wider number of search terms, with more accurate results and less false positives.

Simplify your compliance and generate value from your data with SteelEye.

Our data-centric SaaS platform consolidates all your data, both structured and unstructured, under a single lens and facilitates effortless compliance with MiFID II, MAR, EMIR, Dodd-Frank and more.

Book a Demo

Nothing compares to seeing it for yourself. Schedule a demo now to discover how SteelEye transforms compliance. Provide your details below and we'll be in touch.

Newsletter Signup

Stay ahead of compliance updates, market trends, and exclusive SteelEye news.

Latest News

Blog

Lexicon Fundamentals: Building a Communications Surveillance Lexicon

Selina Tindall

| 07 Aug 2025

MediaAlpha FTC Fine: $45M penalty for violating FTC rules

Enforcement

MediaAlpha Fine - $45m - Unfair and Deceptive Practices - FTC - Aug-25

SteelEye

| 06 Aug 2025

Enforcement

Assurance IQ Fine - $100m - Fraud - FTC - Aug-25

SteelEye

| 06 Aug 2025

MUFG Securities Fined $9.8 Million by SEC

Enforcement

MUFG Fine - $9.8m - Substituted Compliance - SEC - Aug-25

SteelEye

| 06 Aug 2025

Blog

CNMV Record Keeping Requirements

Matt Storey

| 31 Jul 2025

Enforcement

Sigma Broking Fine - £1.1m - Transaction Reporting Failures - FCA - Jul-25

SteelEye

| 29 Jul 2025

Why the Surveillance Lexicon needs to be reinvented

You Seem Interested

Contents

Out with the old and in with the new

Casting Wider nets

Refining the net

AI vs. Language

A new approach

Book a Demo

Newsletter Signup

Latest News

Lexicon Fundamentals: Building a Communications Surveillance Lexicon

MediaAlpha Fine - $45m - Unfair and Deceptive Practices - FTC - Aug-25

Assurance IQ Fine - $100m - Fraud - FTC - Aug-25

MUFG Fine - $9.8m - Substituted Compliance - SEC - Aug-25

CNMV Record Keeping Requirements

Sigma Broking Fine - £1.1m - Transaction Reporting Failures - FCA - Jul-25