Entity Parser

The library is intended to support multiple formats for parsing with a unified underlying object representation. For the Textract customer, the response_parser function has been created to handle API response parsing for DetectDocumentText, AnalyzeDocument, StartDocumentTextDetection and StartDocumentAnalysis.

response_parser

Consumes Textract JSON response and converts them to a Document object format. This class contains all the necessary utilities to create entity objects from JSON blocks within the response. Use ResponseParser’s parse function to handle API response and convert them to Document objects.

textractor.parsers.response_parser.create_expense_from_field(field: Dict, page: Page) → ExpenseField

textractor.parsers.response_parser.parse(response: dict) → Document

Ingests response data and API Call Mode and calls the appropriate function for it. Presently supports only SYNC and ASYNC API calls. Will be extended to Analyze ID and Expense in the future.

Parameters:: response (dict) – JSON response data in a format readable by the ResponseParser.
Returns:: Document object returned after making respective parse function calls.
Return type:: Document

textractor.parsers.response_parser.parse_analyze_id_response(response)

textractor.parsers.response_parser.parse_document_api_response(response: dict) → Document

Parses Textract JSON response and converts them into Document object containing Page objects. A valid Page object must contain at least a unique name and physical dimensions.

Parameters:: response (dict) – JSON response data in a format readable by the ResponseParser
Returns:: Document object containing the hierarchy of DocumentEntity descendants.
Return type:: Document

textractor.parsers.response_parser.parser_analyze_expense_response(response)