Receipt Testing Platform
Back in early 2016 we began developing our receipt OCR technology. One of the first things we developed was our receipt testing platform (RTP). This enabled us to continually benchmark against various batches of receipts as we improved our accuracy.
The RTP also allowed us to not only test results against our own improvements, but to also compare results against other receipt OCR APIs to evaluate how our technology compared to other providers.
The first and most important data field that we worked on was our receipt “total” extraction. At this early stage we were able to gather a limited number of POS receipts from around the world and use these as the starting batch for testing. We grouped these receipts into 3 classes.
High resolution clear and logical receipt formats with good lighting.
Good resolution receipts but with some anomalies including possible folds, crumples, bad lighting, irregular formats etc.
Highly irregular and possible low resolution receipts with multiple anomalies such as folds, crumples, bad lighting, irregular formats etc.
Similar to receipt OCR itself, this classification system could never be an exact science. It did however give us a good gauge as to the strength of our improvements and in what areas we were making them.
Global Test Batch
Our initial batch comprised of just over 1000 receipts gathered from various sources online. We also gathered receipts through a range of companies from different regions who agreed to share samples from their expense reports to assist with our research and development.
We meticulously ensured that the samples were from the widest range of formats and regions as possible and that they were all produced from point of sale printing systems (POS). Over the next 18 months, we grew this batch as we received more receipts from our early Alpha and Beta testers. By that time, we had around 4000 receipts as part of our main test batch.
The main goal of our early development was to improve on the accuracy of a single other receipt OCR service that our research showed as the world leader in this area.
After around 18 months development, we had not only matched this result but improved on them in every class we tested on. At this stage we were confident that we had the beginnings of an intelligent receipt data extraction technology.
There was still the huge task ahead of us to build in all the other data fields, not least of which included a multi-language receipt OCR that could extract all the key data from any POS receipt in the world.
Over 3 years on, our global testing batch has grown to over 50,000 receipts. We now perform tests on all the larger POC batches that come in from our various partners. From our starting vision, we are proud to have achieved much more than we had initially hoped for. Using our current global testing batch on our RTP, our receipt “total” extraction results are currently 15% more accurate than our closest competitor. When tested on various other regional and POC batches, we receive results currently ranging from 12% to 35% more accurate than any other receipt OCR provider in the world.
During our development and largely in part due to our highly accurate extraction of other key data fields, we released a “validated total” field. This field effectively supplies a confidence score of .999 which we have found to be more accurate than humans when it comes to to the manual data entry of totals from POS paper receipts.
The World’s Most Advanced Receipt OCR
Although this study was purely based on our receipt “total” extraction results, we have numerous case studies on line items, establishments and all many of our other data fields. These studies have been tracked for over 18 months, and show impressive results that we are proud and eager to publish.
We look forward to releasing these studies in the coming months to show how our dedication to our technology is producing world class results.
Our methodical testing on all of our data fields and the consistent customer validation on the quality of these results is why we can firmly state, that Tabscanner is the world’s most advanced receipt OCR.