As a testament to how valuable data is becoming for companies, La Poste – Colissimo has teamed up with researchers from IMT schools to fight fraud. Through the big data platform TeraLab, launched in 2014, this public-private partnership has made it possible to explore algorithmic solutions to optimize fraud detection. This research demonstrates how important modernization is for organizations.
Will data centers replace Sherlock Holmes as the stereotype for a detective? That may sound like an absurd question, but it is a legitimate one in light of La Poste -Colissimo’s choice to turn to a big data platform to fight fraud. Over the eighteen-month period between January 2014 and June 2015, tens of thousands of euros were paid for claims identified as suspected cases of fraud by the company. Hence its desire to modernize its tools and its technical expertise in fraud detection.
As such, in late 2015 the company decided to work with the TeraLab platform at Institut Mines-Télécom (IMT). La Poste – Colissimo saw this as an opportunity to kill two birds with one stone: “We were seeking both collaboration to help us overcome our difficulties handling very large volumes of data and the possibility of a rapid return on investment,” explains Philippe Aligon, who is in charge of data analysis for La Poste – Colissimo. Working on detecting fraudulent claims relying on false declarations that packages have been dropped off makes it possible to combine these two objectives.
Teaching algorithms to recognize fraud
TeraLab first worked on “securing datasets, to ensure La Poste – Colissimo that it was a safe work environment,” says Anne-Sophie Taillandier, director of the platform. After this technical and legal step, all files related to proven fraud were sent to TeraLab. This was followed by a phase of statistical learning (machine learning) based on this data. “We proposed a system that inputs the characteristics of the claim: the amount of the claim, the weight of the package, the cause for non-delivery etc.,” explains Jérémie Jakubowicz, head of the data science center at TeraLab. Using this model, and the characteristics of any claim, it is possible to deduce the associated probability of fraud.
To support this learning phase, La Poste – Colissimo provided TeraLab with a sample consisting of archived data about suspicious packages between January 2014 and June 2015. The company’s anti-fraud officers had already ranked each of the cases on a scale from 0 to 4, from fraud that had been proven by internal services to a very low risk of fraud. TeraLab’s role was to reproduce the same ranking using the model developed.
After analyzing the sample, the 500 claims considered to be the most suspicious by algorithms were sent to the experts at La Poste – Colissimo. “We didn’t use the same approach as them at all,” says Jérémie Jakubowicz. “The experts work mainly on a geographical area, whereas we use parameters like the weight of the package or the postcode.” Despite this, there was a 99.8% correlation between the results of the experts and the algorithms. Based on the sample provided, 282 new cases which had been considered non-suspicious by the company were identified as fraudulent by the TeraLab team, and were confirmed as such in retrospect by La Poste – Colissimo.
Towards integration in the company
The company has therefore confirmed successful proof of concept. The algorithmic method works, and in addition to providing faster detection, its automation reduces the costs of detecting fraud on a case-by-case basis. “There are very high expectations for customer, security and IT services,” says Philippe Aligon. The implementation of this fraud detection tool will be integrated in the plan to modernize La Poste – Colissimo’s IT services and in the acquisition of new big data technologies to process data in real time, making it possible to assess claims instantaneously.
Due to the complexity of integrating big data tools, it is not yet possible to implement the algorithm on a large scale. This is not unique to La Poste – Colissimo however, but is a pattern found in many organizations. As Jérémie Jakubowicz explains: “Even when it’s green light ahead on our side, that doesn’t mean that the work’s finished. Using the results and putting it into production are also problems that have to be solved on the company’s side.” A limitation that illustrates the fact that using big data technologies is not just a scientific issue but an organizational one as well.