Exporting Form Data

We now move from Textract OCR to Textract Forms, the API to extract key-value pairs. Here we want to export all key-values extracted from an image as a .csv file.

Installation

To begin, install the amazon-textract-textractor package using pip.

pip install amazon-textract-textractor

There are various sets of dependencies available to tailor your installation to your use case. The base package will have sensible default, but you may want to install the PDF extra dependencies if your workflow uses PDFs with pip install amazon-textract-textractor[pdf]. You can read more on extra dependencies in the documentation

Calling Textract

We use the asynchronous API for this example, but as seen in the OCR example the synchronous API exposes the same methods.

[1]:
import os
from PIL import Image
from textractor import Textractor
from textractor.data.constants import TextractFeatures

extractor = Textractor(profile_name="default")
document = extractor.start_document_analysis(
    # Here we pass a Pillow image instead of path. This changes nothing as
    # Textractor supports most input types.
    file_source=Image.open("../../../tests/fixtures/form.png"),
    # We specify the features that we want, here, we only want keys and values
    # therefore we use TextractFeatures.FORMS.
    features=[TextractFeatures.FORMS],
    s3_upload_path="s3://textract-ocr/temp/",
    save_image=True
)

Retrieving key-values and exporting as CSV

Form data/Key-values are stored at the document and page level as a property and can be accessed as shown below

[2]:
# All key-values present in the document
document.key_values
[2]:
[Date : 04/23/2020,
 Phone : 615-373-6883,
 Address : BLVD,
 Cellular : 683-426-2200,
 Work : 726-448-6720,
 Time : P.M.,
 Phone : 626-200-4890,
 Cleaning Tech : LEWIS,
 Customer : CAMPBELL,
 Day : Wednesday,
 Name : CAMPBELL,
 City : YORK,
 E-Mail" : vilcomp@gmail.com,
 Special Instructions or Directions: : ,
 Sales Tax : 00,
 Late Fee : 00,
 TOTAL : 00]
[6]:
# Export the key-values as csv
document.export_kv_to_csv(
    include_kv=True,
    include_checkboxes=False,
    filepath=os.path.join("kv.csv")
)

View CSV as dataframe

To verify the contents of the file stored, we open it as a Pandas dataframe.

[7]:
import pandas as pd

df_key_values = pd.read_csv(os.path.join(os.getcwd(), "kv.csv"))
df_key_values
[7]:
Key Value
0 Date 04/23/2020
1 Phone 615-373-6883
2 Address BLVD
3 Cellular 683-426-2200
4 Work 726-448-6720
5 Time P.M.
6 Phone 626-200-4890
7 Cleaning Tech LEWIS
8 Customer CAMPBELL
9 Day Wednesday
10 Name CAMPBELL
11 City YORK
12 E-Mail" vilcomp@gmail.com
13 Special Instructions or Directions: NaN
14 Sales Tax 00
15 Late Fee 00
16 TOTAL 00

Conclusion

There are many more supported APIs and use cases in Textractor, if this did not address your use case, we encourage you to look at the other examples.