Using AnalyzeID

Textract AnalyzeID is an API dedicated to processing ID documents such as drivers’ licenses and passports. It is different than other Amazon Textract services because it does not have an asynchronous API and supports only single-page image of ID documents.

Installation

To begin, install the amazon-textract-textractor package using pip.

pip install amazon-textract-textractor

There are various sets of dependencies available to tailor your installation to your use case. The base package will have sensible default, but you may want to install the PDF extra dependencies if you workflow uses PDFs with pip install amazon-textract-textractor[pdfium]. You can read more on extra dependencies in the documentation

Calling Textract

[1]:

from textractor import Textractor

extractor = Textractor(profile_name="default")

document = extractor.analyze_id(
    file_source="../../../tests/fixtures/fake_id.png",
    save_image=True,
)

[3]:

document.images[0]

[3]:

../_images/notebooks_using_analyze_id_2_0.png

Parsing the output

AnalyzeID is a simple API that is only synchronous and only returns specific keys that are predefined in the constants.py file.

[ ]:

from textractor.data.constants import AnalyzeIDFields

[f.value for f in AnalyzeIDFields]

Note that some of that PLACE_OF_BIRTH is specific to passports. An IdentityDocument object can be used like a dictionary, making it very simple to use.

[ ]:

document.identity_documents[0]["FIRST_NAME"] + " " + document.identity_documents[0]["LAST_NAME"]

[ ]:

{f.value:document.identity_documents[0][f.value] for f in AnalyzeIDFields}

Conclusion

AnalyzeID is an API that is very easy to use and Textractor helps you use it by providing an Enum of available keys and a simple dict-like interface.