How to detect suspicious banking activities? - The limitations of existing fraud detection systems
As online payments have increased in recent years, so have the number of fraud cases. Fraudsters adapt to the deterministic rules initially set up and manage to find and exploit the loopholes in the system. Most of the time, the system is presented in the form of a tool that analyzes when thresholds are exceeded. For example, if the user makes a bank transfer with an abnormally high amount to an unknown account, the risk analysis team will be alerted to investigate further and conclude whether or not it is fraudulent.
However, if the fraudster is aware of these threshold rules, then he will make numerous small transactions in order to fly under the radar. Once this new type of fraud is detected, the system will adjust its rules to deal with this problem.
A major problem is that the detection system has to respond to the loopholes found by the fraudsters and will therefore have a delay. On the other hand, the system does not allow to evaluate the level of risk. If the threshold values are too restrictive, many fraudulent transactions will not be detected. But if the thresholds are too wide, the risk analysis team will be overwhelmed by the transactions to be investigated.
Sense4data worked closely with a team of experts to translate these business rules into an artificial intelligence approach. One of the major challenges is to build a model that will address threshold limitations and order transactions according to a decreasing level of risk so that analysis teams focus on the most suspicious activities.
Unlike most existing AI-based solutions, the approach developed by sense4data is called "unsupervised". This means that no known fraud history is required. This is because the annotation work of the risk analysis team is long and tedious. On the other hand, fraud techniques evolve over time and the decision rules learned by a supervised model will have difficulty generalizing to future techniques that do not appear in the annotated data set.
The core of the solution developed by sense4data lies in the construction of an average behavior of each banking user. The algorithm relies on the user's activity history to determine, for example, his usual connection times, his frequency of online transactions, etc. Once this average behavior is built, the algorithm is able to measure in real time if the current session is out of the ordinary and how usual this activity is for this particular user. The advantage of this so-called multivariate approach is that it evaluates exceptionality based on hundreds of combined indicators (called features) and not one by one against predetermined, non-customized threshold values.
The model used to calculate a risk score is the Isolation Forest [1]. This model has shown good performances for different anomaly detection problems. In our case, a large majority of transactions are benign. Frauds can therefore be considered as anomalies. The principle of the Isolation Forest is that if we cut our feature space randomly, an anomaly will probably be isolated faster than the other points.
Let us consider an example in two dimensions, where each transaction X is represented according to two features. In the following figure (source [1]), randomly slicing the point space requires more iterations to isolate the blue point (11) than the red point (4).
The interpretation is that the red dot probably represents an anomaly. The random nature of the Isolation Forest means that the number of iterations can vary. This is why the algorithm performs in parallel several random cuts (tree building) and then the final score is averaged as shown in the figure below (source [1]).
The final risk score of the blue dot appears higher than the red dot. In practice, the dimension (number of features) is several hundred. The score is calculated for all the points (transactions) and the algorithm gives the risk analysis team the highest scores to investigate first. In addition, the solution developed by the Sense4data team relies on its interpretability skills to automatically provide lines of investigation.
This approach allows us to detect all kinds of unusual activities in a personalized manner. The collaboration with an expert team in bank fraud and the AI skills at sense4data allowed to build an algorithm robust to future fraud techniques. Unless the exact behavior of the user is reproduced, any risky session will be detected.
More generally, sense4data is able to address complex "unsupervised" issues (clustering, anomaly detection...). To learn more about the range of skills within sense4data, go here. Follow us also on LinkedIn.
References [1] Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou. Isolation Forest. IEEE Xplore, 2008