Extracting text from images has been a popular problem in software engineering for long. Optical Character Recognition (OCR) has been a pioneer technology used widely to solve this problem. With its ability to transform images containing text into machine-readable data, OCR has revolutionized various industries, from document processing automation to language translation.

While commercial OCR solutions exist, building your own OCR API in Python, a versatile and powerful programming language, offers several advantages, including customization, control over data privacy, and the potential for cost savings.

This guide will walk you through creating your own OCR API using Python. It explores the necessary libraries, techniques, and considerations for developing an effective OCR API, empowering you to harness the power of OCR for your applications.

Prerequisites

To follow along, you need a basic understanding of Python & Flask and a local copy of Python installed on your system.

Creating the OCR API

In this guide, you learn how to build a Flask application that allows users to upload images through a POST endpoint, which then loads using Pillow, and processes using the PyTesseract wrapper (for the Tesseract OCR engine). Finally, it returns the extracted text as the response to the request.

You can further customize this API to provide options such as template-based classification (extracting line items from invoices, inputs in tax forms, etc.) or OCR engine choices (you can find more OCR engines here).

To start off, create a new directory for your project. Then, set up a new virtual environment in the folder by running the following commands:

python3 -m venv env
source env/bin/activate

Next, install Flask, PyTesseract, Gunicorn, and Pillow by running the following command:

pip3 install pytesseract flask pillow gunicorn

Once these are installed, you need to install the Tesseract OCR engine on your host machine. The installation instructions for Tesseract will vary according to your host operating system. You can find the appropriate instructions here.

For instance, on MacOS, you can install Tesseract using Homebrew by running the following command:

brew install tesseract

Once this is done, the PyTesseract wrapper will be able to communicate with the OCR engine and process OCR requests.

Now, you are ready to write the Flask application. Create a new directory named ocrapi and a new file in this directory with the name main.py. Save the following contents in it:

from flask import Flask, request, jsonify
from PIL import Image
import pytesseract

app = Flask(__name__)

@app.route('/ocr', methods=['POST'])
def ocr_process():
    if request.method == 'POST':
        image_file = request.files['image']
        image_data = Image.open(image_file)

        # Perform OCR using PyTesseract
        text = pytesseract.image_to_string(image_data)

        response = {
            'status': 'success',
            'text': text
        }

        return jsonify(response)

The code above creates a basic Flask app that has one endpoint—/ocr. When you send a POST request to this endpoint with an image file, it extracts the file, uses the pytesseract wrapper to perform OCR using its code_to_string() method, and sends back the extracted text as part of the response.

Create a wsgi.py file in the same ocrapi directory and save the following contents in it:

from ocrapi.main import app as application

if __name__ == "__main__":
    application.run()

You can now run the app using the following command:

gunicorn ocrapi.wsgi

Your basic OCR API is ready, and it’s time to test it!

Testing the OCR API Locally

You can use the built-in cURL CLI to send requests to your API or switch to a detailed API testing tool such as Postman. To test the API, you will need to download a sample image that has some text. You can use this simple one, or this scribbled one for now.

Download either of these to the project directory and give it a simple name, such as simple-image.png or scribbled-image.png, depending on the image you choose.

Next, open your terminal and navigate to your project’s directory. Run the following command to test the API:

curl -X POST -F “[email protected]” localhost:5000/ocr

This sends a request to your OCR API and returns a similar response:

{
  "status": "success",
  "text": "This looks like it was written in a hucry\n\n"
}

This confirms that your OCR API has been set up correctly. You can also try with the simple image, and here’s what the response should look like:

{
  "status": "success",
  "text": "This looks like it was written with a steady hand\n\n"
}

This also demonstrates the accuracy of the Tesseract OCR engine. You can now proceed to host your OCR API on the Kinsta Application Hosting so it can be accessed online.

Deploying Your OCR API

To deploy your app to Kinsta, you first need to push your project code to a Git provider (Bitbucket, GitHub, or GitLab).

Before you push your code, you need to set up Tesseract separately on your host system to be able to use the PyTesseract wrapper with it. To be able to use the wrapper on the Kinsta application platform (or any other environment, in general), you will need to set it up there as well.

If you were working with remote compute instances (such as AWS EC2), you could SSH into the compute instance and run the appropriate command for installing the package on it.

However, application platforms don’t provide you with direct access to the host. You will need to use a solution like Nixpacks, Buildpacks, or Dockerfiles to set up the initial requirements of your application’s environments (which will include setting up the Tesseract package locally) and then install the application.

Add a nixpacks.toml file in your project’s directory with the following contents:

# nixpacks.toml

providers = ["python"]

[phases.setup]
nixPkgs = ["...", "tesseract"]

[phases.build]
cmds = ["echo building!", "pip install -r requirements.txt", "..."]

[start]
cmd = "gunicorn ocrapi.wsgi"

This will instruct the build platform to

  1. Use the Python runtime to build and run your application
  2. Set up the Tesseract package in your application’s container.
  3. Start the app using gunicorn.

Also, run the following command to generate a requirements.txt file that the application platform can use to install the requirement Python packages during build:

pip3 freeze > requirements.txt

Once your Git repository is ready, follow these steps to deploy your OCR API to Kinsta:

  1. Log in to or create an account to view your MyKinsta dashboard.
  2. Authorize Kinsta with your Git provider.
  3. On the left sidebar, click Applications and then click Add Application.
  4. Select the repository and the branch you wish to deploy from.
  5. Select one of the available data center locations from the list of 37 options. Kinsta automatically detects the build settings for your applications through your Nixpack file — so leave the start command field blank.
  6. Choose your application resources, such as RAM and disk space.
  7. Click Create application.

Once the deployment is complete, copy the deployed app’s link and run the following command on your CLI:

curl -x POST -F “[email protected]” <your-deployed-app-link>/ocr

This should return the same response as you received locally:

{"status":"success","text":"This looks like it was written with a steady hand\n\n"}

You can also use Postman to test the API.

Postman app showing a POST request sent to the app hosted on Kinsta with its response.
Trying out the app in Postman

This completes the development of a basic OCR API. You can access the complete code for this project on GitHub.

Summary

You now have a working self-hosted OCR API that you can customize to your liking! This API can extract text from images, providing a valuable tool for data extraction, document digitization, and other applications.

As you continue to develop and refine your OCR API, consider exploring advanced features like multi-language support, image pre-processing techniques, and integrating with cloud storage services for storing and accessing images.

What feature do you think is indispensable for a self-hosted OCR API? Let us know in the comments below!

Kumar Harsh

Kumar is a software developer and a technical author based in India. He specializes in JavaScript and DevOps. You can learn more about his work on his website.