How to build Optical Character Recognition fast with Python

Alex Nadein
3 min readMar 12, 2024

--

What is OCR (Optical Character Recognition)?

Optical Character Recognition (OCR) is a technology that enables computers to recognize and interpret text within digital images or scanned documents. Essentially, OCR converts different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data.

OCR systems typically work by analyzing the shapes, patterns, and structures of characters in an image and then translating them into machine-readable text. This process involves several steps, including image preprocessing, character segmentation, feature extraction, and classification.

What is Tesseract?

Tesseract is an optical character recognition engine for various operating systems.[5] It is free software, released under the Apache License.[1][6][7] Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development was sponsored by Google in 2006.[8]

In 2006, Tesseract was considered one of the most accurate open-source OCR engines available.
From Wikipedia

Pytesseract is a Python wrapper for Tesseract-OCR, an open-source optical character recognition (OCR) engine maintained by Google. Pytesseract allows Python developers to easily integrate Tesseract-OCR functionality into their applications without the need for complex low-level coding.

With Pytesseract, you can perform various OCR tasks, such as extracting text from images, PDF files, or scanned documents, and then process or analyze that text within your Python scripts or applications.

How to install Tesseract and Pytesseract?

Using Pytesseract typically involves installing both Tesseract-OCR and the Pytesseract Python library. Once installed, you can use Pytesseract to pass images or image files to Tesseract-OCR, which then processes the images and returns the recognized text as output.

To install Tesseract on MacOS run:

brew install tesseract

It may take some time to complete.
How to install on other OS.

Pytesseract can be installed with pip:

pip3 install pytesseract

How to use Pytesseract?

Simpliest way is to use image_to_string function.

import pytesseract

img_path = 'test.png'
lang = 'eng'
text = pytesseract.image_to_string(img_path, lang=lang)

print(text)

Explore pytesseract.py for other useful functions.

How to build API to OCR?

Let’s create a coupe of helper functions first.

One for saving images:

import pytesseract
import shutil
import os

def _save_file_to_server(uploaded_file, path=".", save_as="default"):
extension = os.path.splitext(uploaded_file.filename)[-1]
temp_file = os.path.join(path, save_as + extension)

with open(temp_file, "wb") as buffer:
shutil.copyfileobj(uploaded_file.file, buffer)

return temp_file

And one for extracting text from images:

async def read_image(img_path, lang='eng'):
try:
text = pytesseract.image_to_string(img_path, lang=lang)
return text
except:
return "[ERROR] Unable to process file: {0}".format(img_path)

Now we can create build API using FastAPI:

from fastapi import FastAPI, File, UploadFile
from typing import List
import asyncio
import image_utils

app = FastAPI()

@app.get("/")
def home():
return {"message": "Visit the endpoint: /api/v1/extract_text to perform OCR."}

@app.post("/api/v1/extract_text")
async def extract_text(Images: List[UploadFile] = File(...)):
response = {}
tasks = []

for img in Images:
print("Images Uploaded: ", img.filename)
temp_file = image_utils._save_file_to_server(img, path="./", save_as=img.filename)
tasks.append(asyncio.create_task(image_utils.read_image(temp_file)))
text = await asyncio.gather(*tasks)

for i in range(len(text)):
response[Images[i].filename] = text[i]

return response

What is FastAPI?

FastAPI is a modern, fast (as the name implies), web framework for building APIs with Python. It is built on top of standard Python type hints (PEP 484) and the asynchronous capabilities introduced in Python 3.7 (async/await syntax). FastAPI is known for its high performance, ease of use, and automatic interactive API documentation generation.

Overall, FastAPI provides a modern and efficient framework for building APIs with Python, offering features that prioritize developer productivity, performance, and maintainability.

We aslo need a web server to launch our service. We will use Uvicorn. Uvicorn is an ASGI web server implementation for Python.

You can install Fast API and Uvicorn with pip:

pip3 install fastapi
pip3 install uvicorn

To launch the app on localhost simply run:

uvicorn main:app --reload

It will be available on http://127.0.0.1:8000.

You can also visit http://127.0.0.1:8000/docs for documentation and testing of your API.

What’s else?

Now you can deploy your API on server or into the cloud.

You can find the source code here.

You can learn more of Python, OCR, and FastAPI here.

--

--