Machine Learning vs Generative AI in Fraud Prevention
I’ve spent 20+ years using statistics and machine learning to fight fraud.
When I first started my career, machine learning wasn’t yet a core part of the Fraud Squad’s toolbox. Back then, most companies relied on traditional methods like WAFs, CAPTCHAs, and static databases to block suspicious activity.
Today, machine learning and AI do a lot of the heavy lifting when it comes to detecting fraud patterns, spotting anomalies, and determining risk scores. Machine learning-based systems are much more effective at responding to changing threats in real-time, which is why most organizations are either already using or looking into using machine learning for fraud prevention. Machine learning algorithms can handle a wide variety of tasks, analyze data, find patterns, and assist in decision-making based on enormous volumes of data.
Given all the hype we've seen around Generative AI and ChatGPT, I’ve been asked whether GenAI is going to replace machine learning in fraud prevention and when it’s better to use machine learning vs GenAI.
Here's my view 👇.
Key Takeaways
- GenAI won’t replace machine learning in fraud prevention any time soon. GenAI isn’t ideal for finding fraud risk in traditional data sets. Supervised and unsupervised machine learning models work much better for fraud prevention.
- For supervised learning, gradient-boosted decision trees are the workhorse model, and it's usually all you need. If you want to build event-based models to flag anomalous event sequences and detect account takeovers, then Long Short Term Memory (LSTM) models are also sufficient.
- Unsupervised learning models like K Nearest Neighbors (KNN) or Isolation forests for anomaly detection are great for clustering and finding fraud rings.
- GenAI can be a great copilot for fraud and compliance operations. You can use it to analyze and summarize large unstructured data sets. It can also help create rules, manage disputes, or even file suspicious activity reports.
- Fraud rates are growing 30% YoY, and SAR filings are growing 15% YoY. Headcount and IT budgets cannot grow as quickly as SAR volumes and fraud rates have. GenAI might be the best way to keep up with the boom in Fraud and SAR filings.
- GenAI will need to solve a few important data quality and governance challenges to evolve beyond copilots.
- The ideal combination is human + rules + ML + GenAI.
How is machine learning used in fraud prevention?
Machine learning based fraud prevention systems use algorithms to analyze large volumes of data and identify signs of fraud, such as suspicious patterns, anomalous activity, and relationships between data points.
By using a mix of historical and real-time data, machine learning based fraud prevention systems can predict the likelihood of fraud with a high degree of accuracy. And they typically get more effective over time. As they monitor new fraud scenarios, they quickly learn the patterns and adapt to stop these threats.
In order to do this effectively, machine learning models need to be trained and refined using “features”. These attributes and predictors teach the algorithms how to recognize patterns and make predictions. We invest a lot of time in training models and creating features at Sardine. Our team has created more than 4,000 fraud detection features and we typically release 250-500 new features every quarter.
How is Generative AI used in fraud prevention?
Generative AI is still a relatively new technology in fraud prevention and fraud detection systems. Today, it’s best suited to be used as a copilot for fraud and compliance operations:
- Analyzing and summarizing large unstructured data sets
- Automating administrative tasks with too many variables for RPA tools
- Answering questions about why a rule has been fired
- Helping you create rules in your risk scoring system
- Preparing and submitting evidence for payment disputes
- Writing SAR narratives and filing activity reports
Ironically, we’ve observed that fraudsters are some of the biggest users of GenAI today. They use these LLMs to scale up their scams and make their tactics more effective.
Some examples include:
- Writing phishing emails and texts with perfect grammar
- Superimposing faces (deep fakes) on videos and IDs during onboarding
- Using voice cloning to bypass authentication via phone support
- Automating communications for different scams
This Forbes piece dives deep into how AI is being used to supercharge fraud.
What tool is better for fraud prevention?
Like many things in life, people want to create a binary answer. Yes or no. True or false. The reality is that there’s no right answer here. It’s a case of picking the right tool for the job.
1. Machine learning is better than AI at fraud detection
GenAI is transformative in many domains. However, supervised and unsupervised machine learning models work well for fraud prevention. They’re naturally much stronger at dealing with large statistical models and giving answers with higher confidence.
Gen AI models are miracles of modern engineering and incredibly useful across the risk spectrum.
Traditional models might find words like “Ford” and “farm” much more similar than “Ford” and “car.” The model builds a deeper semantic understanding of language based on the next-word prediction corpus it has been trained on.
This is fantastic when you’re trying to generate new text with meaning or understand a complex scenario and summarize it.
However, today, GenAI can be expensive and slow. While there are ways to mitigate both challenges, existing model types are still more efficient and likely the right tool when modeling fraud risk from transactional, user, device, or behavioral signals.
GenAI and large language models also have a habit of hallucinating. This is true of the open-source, closed-source, and proprietary models that use transformer architectures to create semantic meaning from large data sets with embeddings.
Hallucination can be a feature, not a bug; it is creativity and problem-solving and can deal with messier problems. The fraud squad knows this well since it’s what ops teams spend much of their time doing.
GenAI is not as strong at raw number crunching and statistical modeling as existing machine learning techniques.
Author Ethan Mollick calls GenAI and LLMs “pretty good people.” They’re creative and good at problem-solving but not always 100% accurate.
Machine learning models can be much more robust when we are looking to perform statistical analysis of transactional, user, or web-based data sets for patterns of behavior or activity.
2. Machine Learning gives the best bang for buck.
Most financial institutions and merchants have barely scratched the surface of what ML can do.
The fraud and compliance teams know they need this capability but often lack the investment or engineering resources to build out the models (with one or two notable exceptions). Even in the largest and best practice companies we see, the models are a long way from the scale you would expect in big tech or ad tech. There’s a massive unlock in accessing machine learning capabilities at scale in a turnkey way.
Machine Learning will get substantial performance lifts in preventing, detecting, and reporting fraud or compliance risks. Today, most models are small, rely on limited data sets, and the fraud squad is hampered by poor access to external data.
For supervised learning, the best-kept secret amongst machine learning practitioners - gradient-boosted decision trees is the workhorse model. Additionally, GBDTs have always proven solid with structured data where the history of trial and error is baked in. We also see CNNs and GNN models in more advanced sectors like RTP and Crypto fraud.
If you want to build event-based models to flag anomalous event sequences and detect account takeovers, then sequence models are also sufficient.
Finally, unsupervised learning models like K Nearest Neighbors (KNN) or Isolation forests for anomaly detection are sufficient for clustering. With this first-order data set, graphing methods help find fraud rings.
3. GenAI is a great co-pilot for fraud and compliance operations
I believe GenAI’s “low-hanging fruit” is the co-pilot for fraud and compliance. Organizations spend 10% to 30% of their headcount on compliance-related activities.
Today, the fraud and compliance teams are trying to keep up with ever-higher volumes of fraud and SARs without a bigger budget. Much of the work is manual, and these tasks are often repetitive data summaries or trawling through large data sets. This forces them to focus on only the worst cases.
A large language model can perform up to 90% of the dull administrative work. Allowing them to detect and prevent much more fraud, and report many more SARs without additional headcount.
Generative AI is ready-made for this, and at Sardine we already have several use cases live.
4. Data governance and quality are key to making GenAI effective
I can imagine numerous areas where LLMs could have significant uplifts in performance as “more than” a co-pilot.
One example where LLMs can potentially have more of an impact is in account takeover detection via event sequence modeling. But often, the problem is not that we don't have a good modeling technique available but rather the lack of good labeled data – no company in my experience ever collects great labels for which logins were truly account takeovers vs. not.
Let's first solve for that, then the rest will follow quite easily.
Machine Learning vs. GenAI for fraud prevention
In summary, it’s clear that both machine learning and genAI are going to play crucial roles in fraud prevention. It’s AND, not OR. Here’s a good chart that summarizes the differences from the folks at Techopedia:
In time, I do think GenAI tools will surprise us by doing a lot of what the fraud squad currently does. Creativity, problem-solving, generating new rules and applying them.
But I don’t think we’ll have to worry about being replaced by AI. These copilots will simply make us more productive, and free up our time and resources to focus on higher value tasks.
The ideal combination is human + rules + ML + Generative AI.
I would love to hear from folks more informed than me on use cases for GenAI in fraud modeling and what has worked vs. what has not.