How Receipt OCR Accuracy Is Measured: Field-Level vs Character-Level

If a vendor tells you their receipt OCR is “99% accurate,” your first question should be: 99% of what? Accuracy is not a single number. It depends on whether you measure individual characters or whole fields, and the gap between those two figures is large enough to change your build decisions.

This post explains the two main ways receipt OCR accuracy is measured, why the distinction matters for expense and fintech applications, and the specific questions to put to a vendor before you commit.

What is the difference between character-level and field-level accuracy?

Character-level accuracy measures how many individual characters the engine reads correctly. If a receipt total reads $142.50 and the OCR returns $142.50 with one digit wrong, say $142.30, that is 6 of 7 characters correct, or roughly 86% at the character level.

Field-level accuracy measures whether a complete, structured field is correct as a unit. In the example above, the total field is wrong, full stop. One incorrect character makes the field a failure, regardless of how clean the rest of the characters were.

The same receipt can score 98% character accuracy and only 90% field accuracy. Both numbers are true. They answer different questions.

Why character-level numbers look better

Character-level metrics average across thousands of characters, so isolated errors get diluted. A long receipt has hundreds of characters, and most of them are easy: printed product names, dates, store headers. The hard parts, the decimal in a total or a single transposed digit in a tax line, are a tiny fraction of the character count but a large fraction of the fields that matter.

Why does field-level accuracy matter more for expense and fintech apps?

Your application does not consume characters. It consumes fields: merchant name, date, subtotal, tax, total, line items, payment method. A reimbursement workflow, an expense policy check, or a loyalty accrual runs on those structured values.

A single wrong digit in the total field can:

  • Push an expense over or under a policy threshold and route it incorrectly.
  • Create a reconciliation mismatch against a card transaction feed.
  • Mis-credit loyalty points or cashback.
  • Force a human review that erases the cost savings of automation.

None of that is captured by a character-level score. Field-level accuracy is the metric that maps to your actual error rate in production.

Which fields should be measured separately?

An aggregate field-accuracy number is still too coarse. Different fields carry different business risk, and they fail for different reasons. Ask for accuracy broken out by field, at minimum:

  • Total: the highest-stakes field for reconciliation and policy.
  • Date: drives reporting periods and policy windows.
  • Merchant name: needed for categorization and matching.
  • Tax and subtotal: often required for accounting, frequently misread on faded or thermal receipts.
  • Line items: the hardest structured extraction, and the most variable across formats.

Line-item accuracy in particular deserves its own scrutiny. Extracting a full table of descriptions, quantities, and prices is a different problem than reading a single total, and the accuracy gap between the two is usually significant.

How should accuracy be reported so it is verifiable?

Numbers without context are not evidence. A credible accuracy claim should specify the conditions it was measured under. Look for the following:

  1. The metric definition: character-level, field-level, or both, with the formula stated.
  2. The test set: how many receipts, from which countries, in what languages and currencies.
  3. Image quality mix: clean scans versus phone photos, crumpled receipts, faded thermal paper, and low light.
  4. Ground truth method: who labeled the correct values and how disagreements were resolved.
  5. How partial matches are scored: is a date in the wrong format counted as correct, partially correct, or wrong?

If a vendor reports one global percentage with none of this context, treat it as marketing, not measurement.

What about formatting and normalization?

There is a subtle category of error that pure character matching misses. An engine may read every character of a date correctly but return it in a format your system cannot parse, or it may read a total with the wrong decimal separator for the locale.

This is why normalized field accuracy matters: the question is whether the field arrives in a usable, correctly structured form, not just whether the raw characters matched. For global receipts with mixed date formats, currency symbols, and decimal conventions, normalization is where a lot of real-world errors live.

What should you ask a receipt OCR vendor?

Bring these questions to any evaluation. The quality of the answers tells you as much as the numbers.

  • Do you report field-level accuracy, and can you break it out by field?
  • What is your line-item extraction accuracy specifically?
  • What does your test set look like by country, currency, and image quality?
  • How do you handle and score thermal, faded, and phone-photographed receipts?
  • Can I run my own representative sample through the API and measure accuracy myself?
  • How are dates, currencies, and decimals normalized across locales?
  • What is the confidence score per field, and how is it calibrated?

The last point is practical: per-field confidence scores let you build review queues that only flag the fields likely to be wrong, instead of routing entire receipts to humans. That is often the difference between automation that pays for itself and automation that does not.

The bottom line

Character-level accuracy describes how well an engine reads text. Field-level accuracy describes how well it produces the structured data your application actually uses. For expense and fintech products, the second number is the one that predicts your production error rate, your reconciliation rate, and your manual review load.

When you evaluate Tabscanner, we encourage you to test against your own receipts and measure field-level accuracy on the fields you care about. Bring the questions above. The right vendor will welcome them.

← All articles