Visualizing Results

When debugging it’s usually helpful to be able to see what went wrong. Textractor offers simple API to see your output that can help a lot when developing heuristics.

Installation

To begin, install the amazon-textract-textractor package using pip.

pip install amazon-textract-textractor

There are various sets of dependencies available to tailor your installation to your use case. The base package will have sensible default, but you may want to install the PDF extra dependencies if your workflow uses PDFs with pip install amazon-textract-textractor[pdf]. You can read more on extra dependencies in the documentation

Calling Textract

[1]:
import os
from PIL import Image
from textractor import Textractor
from textractor.data.constants import TextractFeatures

extractor = Textractor(profile_name="default")
document = extractor.analyze_document(
    file_source=Image.open("../../../tests/fixtures/form.png"),
    features=[TextractFeatures.FORMS, TextractFeatures.TABLES],
    save_image=True,
)

Let’s look at the asset.

[2]:
Image.open("../../../tests/fixtures/form.png")
[2]:
../_images/notebooks_visualizing_results_3_0.png
[3]:
document
[3]:
This document holds the following data:
Pages - 1
Words - 494
Lines - 129
Key-values - 20
Checkboxes - 29
Tables - 1
Identity Documents - 0
Expense Documents - 0
[4]:
document.checkboxes.visualize()
[4]:
../_images/notebooks_visualizing_results_5_0.png
[5]:
document.key_values.visualize()
[5]:
../_images/notebooks_visualizing_results_6_0.png

Visualizing Tables

Tables can be visualized as well (here) in purple.

[8]:
document.tables.visualize()
[8]:
../_images/notebooks_visualizing_results_11_0.png

Conclusion

Textractor packs visualization utilities that help you understand the Textract output to implement better heuristics.