Tabscanner simplifies extracting data from receipts and invoices with its cutting-edge OCR technology. This blog demonstrates how to integrate Tabscanner’s API into your Python backend to process receipts and retrieve structured data in JSON format.
Leveraging short polling, Tabscanner efficiently handles the processing of receipt images, with results typically available in about 5 seconds. Let’s explore how to implement this.
Why Use Tabscanner?
Tabscanner supports the following features:
- Uploads: Process images in JPG or PNG format, including smartphone photos or screenshots.
- Language Support: Handle multiple languages and character sets.
- Data Extraction: Retrieve totals, tax breakdowns, line items, merchant details, and more.
Prerequisites
- Tabscanner API Key: Obtain this from your Tabscanner account.
- Python Environment: Install Python and the requests library (pip install requests).
- Backend Integration: Tabscanner is designed for server-side use, not direct integration with mobile apps.
Find out more about Tabscanner OCR
Step 1: Upload a Receipt for Processing
The first step is submitting a receipt image to the /process endpoint. This returns a token that you’ll use to poll for results.
Code Example
import requests
# API Configuration
API_KEY = "your_api_key_here"
PROCESS_ENDPOINT = "https://api.tabscanner.com/api/2/process"
def upload_receipt(file_path):
"""
Upload a receipt image to Tabscanner for processing.
Returns a token to poll for results.
"""
with open(file_path, 'rb') as file:
response = requests.post(
PROCESS_ENDPOINT,
headers={"apikey": API_KEY},
files={"file": file}
)
if response.status_code == 200:
token = response.json().get("token")
print(f"Token: {token}")
return token
else:
print(f"Error uploading receipt: {response.status_code}, {response.text}")
return None
Step 2: Poll for Results Using the Token
Once you have the token, poll the /result endpoint to check if the receipt processing is complete. Polling every second is recommended after an initial delay of about 5 seconds.
Code Example
import time
RESULT_ENDPOINT_BASE = "https://api.tabscanner.com/api/result/"
def poll_for_result(token):
"""
Poll Tabscanner's result endpoint using the token until processing is complete.
Returns the extracted data as a JSON object.
"""
polling_url = f"{RESULT_ENDPOINT_BASE}{token}"
while True:
response = requests.get(polling_url, headers={"apikey": API_KEY})
if response.status_code == 200:
result_data = response.json()
status = result_data.get("status")
if status == "done":
print("Processing complete!")
return result_data.get("result")
elif status == "pending":
print("Processing... retrying in 1 second.")
time.sleep(1)
else:
print(f"Unexpected status: {status}")
return None
else:
print(f"Error polling for result: {response.status_code}, {response.text}")
return None
Step 3: Combine Upload and Polling
Integrate the upload and polling functions into a complete workflow.
Code Example
def process_receipt(file_path):
"""
Upload a receipt and retrieve its processed data.
"""
print("Uploading receipt...")
token = upload_receipt(file_path)
if not token:
print("Failed to start receipt processing.")
return None
print("Polling for results...")
result = poll_for_result(token)
if result:
print("Receipt Data Retrieved:")
print(result)
else:
print("Failed to retrieve receipt data.")
Step 4: Run the Script
Provide the path to your receipt image and process it.
if __name__ == "__main__":
receipt_file = "path/to/your/receipt.jpg"
process_receipt(receipt_file)
Example Output
Once the receipt processing is complete, the API returns a JSON object with structured data like:
{
"establishment": "SuperMart",
"date": "2025-01-01 14:32:00",
"total": 45.67,
"subTotal": 41.23,
"tax": 4.44,
"lineItems": [
{"desc": "Apple", "qty": 3, "price": 1.5, "lineTotal": 4.5},
{"desc": "Milk", "qty": 1, "price": 2.5, "lineTotal": 2.5}
]
}
Error Handling
The API provides detailed error codes. Here are some common ones:
- 400: API key not found.
- 403: No file detected.
- 405: Unsupported file
type. - 500: OCR Failure.
Use the response’s message and status_code attributes for debugging.
Tips for Improving Results
- Image Quality: Ensure the receipt is well-lit and in focus.
- Format Guidance: Use images with dimensions greater than 720×1280 for best results.
- Custom Configurations: Contact Tabscanner for advanced features like custom fields or line-item resolution.
Conclusion
Tabscanner’s receipt OCR API is a powerful tool for extracting data from receipts with
minimal setup. By following this guide, you can integrate it into your Python backend and streamline your data
processing workflows.
For more details, visit the Tabscanner Documentation. 🚀
Happy coding!