TeraLab and La Poste join forces to combat parcel fraud
July 5, 2016 - Big Data & AI - Cybersecurity

La Poste - Colissimo, for example, has joined forces with researchers from ITM schools to combat fraud. Through the TeraLab big data platform launched in 2014, this public-private partnership has explored algorithmic solutions to optimize fraud detection. This work illustrates the importance of modernizing organizations.
Will the data center replace Sherlock Holmes as the stereotypical detective? The question may seem far-fetched, but it's a legitimate one in light of La Poste -Colissimo's decision to turn to a big data platform to combat fraud. Over 18 months, between January 2014 and June 2015, tens of thousands of euros were paid out in compensation, for cases identified as suspected fraud by the company. Hence its desire to modernize its tools and business experience in fraud detection.
To this end, at the end of 2015, it decided to collaborate with the TeraLab platform at the Institut Mines-Télécom (IMT). La Poste - Colissimo thus saw the opportunity to kill two birds with one stone: "We were looking for both a collaboration that would enable us to overcome our difficulties in handling large volumes of data, and the possibility of a rapid return on investment," explains Philippe Aligon, head of data analysis at La Poste - Colissimo. Working on the detection of compensation fraud based on false declarations of parcel delivery enables us to combine these two objectives.
Teaching algorithms to recognize fraud
TeraLab is first working on "securing data batches, to justify to La Poste - Colissimo a safe working environment", explains Anne-Sophie Taillandier, the platform's director. After this technical-legal stage, all files relating to proven fraud are sent to TeraLab. This is followed by a statistical learning phase(machine learning) based on this data: "We proposed a system that takes as input the characteristics of the claim: how much is it, the weight of the parcel, the reason for non-delivery, etc." explains Jérémie Jakubowicz, head of TeraLab's data science division. From this model, and the characteristics of any claim, it is possible to deduce an associated probability of fraud.
To support this learning phase, La Poste - Colissimo provided TeraLab with a sample of archived data from suspicious parcels between January 2014 and June 2015. The company's anti-fraud managers first ranked each file on a scale of 0 to 4, ranging from fraud confirmed by internal services to a very low risk of fraud. TeraLab's role: to reproduce the same classification based on the model developed.
After analyzing the sample, the 500 requests deemed most suspicious by the algorithms are sent to experts at La Poste - Colissimo. We didn't have the same approach as them at all," continues Jérémie Jakubowicz. The experts work more on the geographical zone, whereas we use parameters such as the weight of the parcel or the zip code." Nevertheless, the correlation of results between the experts and the algorithms is 99.8%. Of the sample provided, 282 new files deemed non-suspicious by the company were even identified as fraudulent by the TeraLab team, and validated as such a posteriori by La Poste - Colissimo.
Towards corporate integration
The company has successfully completed the proof-of-concept phase. The algorithmic method works, and in addition to providing faster detection, its automation would reduce the costs of case-by-case detection. "Philippe Aligon confides: "There are very high expectations from the customer, safety and IT departments. The industrialization of this fraud detection tool will be integrated into La Poste - Colissimo's IT modernization program, and with the acquisition of new Big Data technologies for real-time data processing, enabling instant quotation of claims.
The complexity of integrating Big Data tools has not yet enabled the industrialization of algorithms. This is not unique to La Poste - Colissimo, however, but is part of a pattern that can be found in many structures. Jérémie Jakubowicz explains: "Even if all the lights are green on our side, that doesn't mean the job is finished. Exploiting the results and putting them into production are also problems that have to be solved on the company's side. This limitation clearly demonstrates that the use of Big Data technologies is not just a scientific issue. It's also a question of organization.
TeraLab, a catalyst for big data projects
TeraLab is a big data platform for research, innovation and teaching. It is run by the Institut Mines-Télécom and the national economics and statistics school group (GENES). It was through the "Cloud computing and big data" call for projects under the Investissements d'avenir (PIA) program that TeraLab came into being in 2014. Its aim is to federate demand for software and infrastructure.
In addition to providing commidity servers and a teramemory machine, TeraLab offers a team of researchers capable of identifying demand and supporting project leaders throughout their work. The skills on offer range from infrastructure configuration to consulting, including algorithms, machine learning, data visualization, etc. The TeraLab team also helps the various players to choose the type of project best suited to their needs (PIA-type collaborative project, European project, joint laboratory, challenge, proof of concept, etc.).
















