Receipt Template System
A large UK retail specialist was working on a project to extract detailed data from a number of UK retail stores. The team had been using Google Vision and a templating system to try and extract the data from store receipts. The project had already been 6 months in development when the team had begun to work on line item extraction on the first set of stores.
This proved to slow the process down considerably as the team was struggling with where line items started and ended. To compound the issue, they realised that each store had multiple formats depending on the region and was throwing the templating system out on different locations. They realised that Google Vision was also missing text at times which was resulting in missed establishments which then failed to initiate the correct template for the store.
The problem got worse when one of the stores they had spent considerable time on changed their POS system and started issuing a completely different format on their receipts. At this point the team realised they needed to build much more than a templating system to adapt and stay flexible to these kinds of changes. When looking at the number of stores and formats left to solve, they quickly realised that a more intelligent system was needed to make this a scalable solution.
Format Scalability
A UK technology consultant was involved and began researching alternatives to the project. Google Vision had appeared to be a great solution at the start. The OCR looked accurate and they were getting generally good text readings from their tests. As the project unfolded and the team looked deeper into the data, they realised that the problem was not an OCR one, but rather a scalable formats issue. The identification of the fields was the true problem that needed solving, not the accurate OCR reading the text.
The consultant began running tests on the Tabscanner open test platform and was immediately impressed by the results. He started testing on particular issues where line items were bunched close together and spread across multiple lines, the word “total” was not mentioned on the receipts or was placed on a different line to the actual total. These and many other issues were causing problems but these receipt fields were all being accurately identified by Tabscanner.
After several discussions with our team, a set of requirements were shared and a clearly defined POC developed to assess the costs of continuing with the project versus a switch to Tabscanner. The POC was performed on approximately 5000 receipts on over 500 stores in the UK. The results were then analyzed and a decision was made.
The project was effectively abandoned and replaced by Tabscanner’s cloud based API. The program is now processing over 10,000 receipts per day and providing accurate results right the way down to line item detail on over 2000 UK stores.