Exporting Form Data

We now move from Textract OCR to Textract Forms, the API to extract key-value pairs. Here we want to export all key-values extracted from an image as a .csv file.

Installation

To begin, install the amazon-textract-textractor package using pip.

pip install amazon-textract-textractor

There are various sets of dependencies available to tailor your installation to your use case. The base package will have sensible default, but you may want to install the PDF extra dependencies if your workflow uses PDFs with pip install amazon-textract-textractor[pdf]. You can read more on extra dependencies in the documentation

Calling Textract

We use the asynchronous API for this example, but as seen in the OCR example the synchronous API exposes the same methods.

[1]:

import os
from PIL import Image
from textractor import Textractor
from textractor.data.constants import TextractFeatures

extractor = Textractor(profile_name="default")
document = extractor.start_document_analysis(
    # Here we pass a Pillow image instead of path. This changes nothing as
    # Textractor supports most input types.
    file_source=Image.open("../../../tests/fixtures/form.png"),
    # We specify the features that we want, here, we only want keys and values
    # therefore we use TextractFeatures.FORMS.
    features=[TextractFeatures.FORMS],
    s3_upload_path="s3://textract-ocr/temp/",
    save_image=True
)

Retrieving key-values and exporting as CSV

Form data/Key-values are stored at the document and page level as a property and can be accessed as shown below

[2]:

# All key-values present in the document
document.key_values

[2]:

[Date : 04/23/2020,
 Phone : 615-373-6883,
 Address : BLVD,
 Cellular : 683-426-2200,
 Work : 726-448-6720,
 Time : P.M.,
 Phone : 626-200-4890,
 Cleaning Tech : LEWIS,
 Customer : CAMPBELL,
 Day : Wednesday,
 Name : CAMPBELL,
 City : YORK,
 E-Mail" : vilcomp@gmail.com,
 Special Instructions or Directions: : ,
 Sales Tax : 00,
 Late Fee : 00,
 TOTAL : 00]

[6]:

# Export the key-values as csv
document.export_kv_to_csv(
    include_kv=True,
    include_checkboxes=False,
    filepath=os.path.join("kv.csv")
)

View CSV as dataframe

To verify the contents of the file stored, we open it as a Pandas dataframe.

[7]:

import pandas as pd

df_key_values = pd.read_csv(os.path.join(os.getcwd(), "kv.csv"))
df_key_values

[7]:

	Key	Value
0	Date	04/23/2020
1	Phone	615-373-6883
2	Address	BLVD
3	Cellular	683-426-2200
4	Work	726-448-6720
5	Time	P.M.
6	Phone	626-200-4890
7	Cleaning Tech	LEWIS
8	Customer	CAMPBELL
9	Day	Wednesday
10	Name	CAMPBELL
11	City	YORK
12	E-Mail"	vilcomp@gmail.com
13	Special Instructions or Directions:	NaN
14	Sales Tax	00
15	Late Fee	00
16	TOTAL	00

Conclusion

There are many more supported APIs and use cases in Textractor, if this did not address your use case, we encourage you to look at the other examples.