Data-Engineering Led Fraud & Compliance Platform

Kazuki Nishiura

min read

•

April 11, 2024

Machine learning and AI are now critical tools for detecting fraud and financial crime. However, as companies increasingly rely on machine learning to improve their fraud and compliance performance, they face a key trade-off.

The trade-off in fraud and compliance performance for data engineers in a domain like fraud or compliance ops is between implementing a generic feature store or building machine-learned features in-house.

Neither of these are ideal.

What if you could have the best of both worlds?

Sardine builds domain-specific machine-learned features that you have complete control over.

Fraud & compliance tools and in-house build tradeoffs are sub-optimal

Today fraud and compliance teams are becoming increasingly engineering-led, developing their in-house machine learning capabilities and consuming multiple data providers to solve challenges for their business.

The traditional solution provider approach doesn’t make this easy.

Traditional anti-fraud solutions often deliver a score for a risk event. For example, if a customer is trying to make a payment, it might deliver a high-risk or a low-risk score. This simple, but effective tool allows financial institutions, e-commerce and marketplace companies to quickly approve or decline a transaction.

However, this black box approach isn’t a fit for teams that want to train their own machine learning (ML) models or solve a specific use case or threat.

This leads to three fairly standard sub-optimal trade-offs

Accept the black boxes but lose business performance ❌
Build in-house by connecting underlying data providers creating significant opex and overhead ⚠
Use a generic feature store to craft the solution ⚠

The ideal compliance solution is something that gives a best of in-house with the best of a feature store.

So how do we get that?

First, let us look at what feature stores should do.

What we want from a feature engine

No Training/Serving skew: The gap between training and live risks creating feature skew between Online Analytical Processing (OLAP) and Online Transaction Processing (OLTP). Feature skew occurs when there are discrepancies between the data used for training machine learning models and the data encountered in production environments. Sardine's platform eliminates this issue, ensuring consistent and reliable feature behavior.
Point-in-Time Correctness: Accurate backtesting and model training require data that reflects the state of the system at specific points in time. Sardine's platform guarantees point-in-time correctness, enabling businesses to reliably evaluate historical data and train models accordingly.
Observability: Visibility into features and machine learning scores in production environments is crucial for monitoring, debugging, and optimizing fraud and compliance systems. Sardine's platform provides comprehensive observability, allowing businesses to track and understand the behavior of their fraud detection models in real-time.
Data Warehouse Connectivity: Modern businesses rely on data warehouses to store and analyze large volumes of data. Sardine's platform provides data warehouse access. We can integrate it to send data into yours.
Low Latency: In the world of online transactions, every millisecond counts. Sardine's platform is designed to operate with low latency, ensuring that fraud detection and prevention measures can be applied in near real-time, minimizing the risk of successful fraudulent activities.

The problems with generic feature stores

Feature selection is time-consuming. Finding a domain-specific feature that works for your context takes significant time and effort. While a feature store makes prototyping various features less difficult, you still need to set up the pipeline to backtest your features and discover features that are applicable in your domain. Ideally, the operational experts would be able to do this with low-code or no-code.
The features have no obvious fit inside an ML model, workflow or rules engine. Fraud & compliance teams live by rules, workflows and models functionally support that. Even after investing considerable resources into developing domain-specific features, organizations often encounter challenges in seamlessly integrating these features into their existing machine learning models, workflows, or rules engines
Generic feature stores require engineering resource to implement vs analysts able to do it themselves. Implementing and maintaining these feature stores typically requires significant engineering resources, which can be a limiting factor for organizations with lean engineering teams or those seeking to empower analysts and subject matter experts to take a more hands-on role in feature development and deployment.

Best of both: Features engineered for the risk context

Sardine provides over 4,000 features that have been pre-engineered. These features include

Proprietary device and intelligence signals like like 'TrueLocation.Country'. Understanding the true country of a user can be complex (if they’re using proxies or a VPN, for example).
Sardine network data signals like like determining if a particular email was previously associated with a fraudulent chargeback within Sardine's network (‘email.isFraudulent’)
3rd party enriched data signals from Telco’s, email providers, open banking, bank consortia, global sanctions lists and more than 35 data providers.
Velocity and aggregation features such as number of transaction from the given email address, or number of chargebacks observed for given card BIN in last 30 days

For any 2 dimensions across 50+ dimensions, we have created counts and statistical features.

The 50+ dimensions include things like: email address, merchant id, card hash number, card BIN, bank routing number, device id, and more. Then we pre-create counts like:

#emails are associated with a device id -- i.e. velocity check to find onboarding fraud
#devices associated with a card BIN -- useful to find card enumeration attacks
average transaction value at a merchant id -- useful to find if a particular card transaction at a merchant is abnormally high indicative of stolen card fraud

The calculated features are available in real-time to both our rule engine and ML engine, so we can stop payment fraud or transaction fraud within 100s of milliseconds

All features are pre-engineered to

Work with your own proprietary data when you consume them so you can leverage your unique data assets while benefiting from Sardine's capabilities.
Work with your existing rules engine or within the Sardine rules engine so analysts can build new capabilities without engineering support. It also means you can integrate with your current infrastructure and workflows.
Observable with analytics in the Sardine dashboard so you can monitor and gain insights into the performance of your fraud detection models in real-time
Underlying data for analysis and back testing is available from the Sardine data warehouse, so you can conduct in-depth historical analysis, model validation, and continuous improvement.
Built from the ground up to avoid feature drift between OLAP and OLTP (and rigorously observed and tested), so you can rely on consistent and accurate detection and alerting across your online analytical processing and online transaction processing systems.

You can use these features (and our underlying data signals) as you see fit.

If you’re building an engineering capability or already have a sophisticated in-house team, get in touch to get access today and see how we can improve your model performance.

‍The Future of Fraud and Compliance Strategies

‍The demand for domain-specificity, speed, and accuracy in a data-driven world calls for a platform that offers advanced machine learning features and the flexibility and control of in-house development. By prioritizing features such as point-in-time correctness, low latency, and observability, and providing over 4,000 pre-engineered features that integrate with your existing systems, we’re giving you the tools you need in the fight against fraud and financial crime. With the capability to enhance data's power through comprehensive observability and rigorous testing that eliminates feature drift, businesses can now confidently and precisely address unique challenges. If building a superior in-house engineering team or augmenting your existing one is your goal, embracing this platform could be the step forward you need to turbo charge your model performance and operational effectiveness.

content

Heading 2

Share the article

Tagged topics

Fraud

Product

About the author

Kazuki Nishiura

Head of Engineering