AI and Machine Learning are fantastic, but benefit from human input
- 20 technical questions around fraud detection and models
- Complicated fraud scenarios must be well defined
- AI and machine learning are effective, but multi-model approaches will outperform single models
If your existing fraud platform hasn’t migrated to AI and machine learning yet, I’m sure someone in your organization is talking about it! Here’s why.
IBM RegTech Innovations recently conducted an informative webinar series entitled AI Fraud Detection — Beyond the Textbooks. The outcome was a list of questions titled, “Ask the expert: 20 questions fraud fighters want to know.”
The comprehensive questions had a recurring theme: while AI and machine learning are valuable tools in the fight against fraud, data collection and deployment methodology will make or break your process.
Several of the top questions included:
3. “What’s the best way to prevent fraud on new channels where you don’t have history data to train with or when there’s no fraud-flagged transactions in your data set?”
4.“Are supervised-learning algorithms enough for detecting fraud cases? What can you do to identify or detect a new fraud pattern not in the historical data?”
8.“Ted, do you believe that it would be a good fraud-detection strategy to use different models targeting different angles as opposed to using one generic model to detect all fraud attempts?”
10.“Hi Ted, what is your opinion of rules-based models vs machine-learning models or will we always have hybrid models to reduce the false-positive rate? Thanks.”
Other highlights addressed the dangers of “overfitting”:
If you keep optimizing the parameters in a complex machine-learning model long enough, it will tend to memorize the data used to train it. The problem is that it starts to lose the forest for the trees and may narrow in on unimportant details in the data rather than broader, more important factors.
A common way to prevent over-training leading to overfitting is to train a system on one set of data and hold out another set of data for testing that is not used in training. As the model is trained, it will perform better and better on both the training examples and the held-out examples. But, eventually, it will continue to do better mimicking the training data, while performing worse on the hold-out sample of examples. That’s when it’s overfitting and one rolls it back to where it achieved peak performance on the hold-out examples.
In terms of transactional fraud, the panel pointed out a key weakness in “synths” (synthetic accounts): A pronounced lack of diversity.
Criminals don’t seem to work very hard at making the behavior of their synths diverse. The account origination data may be diverse due to diversity in the population of stolen identity information, but the behavior of synths after origination is usually pretty stereotypical. (Sometimes you get lucky and even the origination data has some unusual consistency due to where or how the personal data was stolen.)
Another question which applies to check fraud detection, “self-learning is ‘weaker’ for fraud detection because the model targets are evading detection. Why do you say that?” The answer was of particular interest to OrboGraph, as we utilize self-learning in our image analysis platform for account profiling. We agree that if you rely on self-learning to feed the final scoring detection, you may face limitations. However, when factoring in the attributes of check images, self-learning at the account profile level is imperative to developing a strong statistical and image representation for each account.
Given the nature of check payments, machine learning-based systems will do a terrific job analyzing data fields, but are unable to pick up the unique visual cues that checks offer in abundance. Scoring and incorporating the physical and visual clues to fraud detection from image analysis will combat the “four horsemen of check fraud” — Counterfeits, Forgeries, and Alterations — as a part of on-us/deposit fraud.
Next week we’ll explore a variety of fraud perpetrator use cases – you won’t want to miss it.