Beyond Basics: Einstein OCR vs Amazon Textract

6 min readJul 20, 2021

Everyone who follows the amazing field of Computer Vision/ Machine Learning would be familiar with OCR capabilities like AWS Textract or SFDC Einstein OCR.

This article is not about talking details on what capability each offers but a little bit deeper into how each behaves when applied beyond basics.

To make it little bit interesting I am choosing a use case I came across recently.

Label information that has some standards common among different variations like Health Insurance Card

Health Insurance CARD

Day by day digital platforms are making it possible to penetrate previously unknowns or less preferable use cases for instance traditionally digital B2C model in life science industry is from a Pharma Industry to HCP/HCO but now Specialty Pharma companies has started to reach direct end patients to engage on a patient journey.

So, when asked to see if there is way to make the data entry of insurance information in a patient portal for benefits investigation to be simplified, the first thing I started looking at is at OCR capability to read text from a picture and then is there any standards when companies print insurance card like a driver's license or a corporate Id card?

The OCR problem becomes simple when you know what you are dealing with like a predefined form, a standard card. However, when it comes to Insurance Card there are lot of variations even within plans offered by same company.

To complicate further different insurer, use different labels to represent information that are similar across like Member, Group Number, GRP etc.

For instance, HCOs knows what “Member Id” is, but each company print it differently like UHC prints it as “Member Id:” whereas Aetna prints it as “ID” in the insurance card.

Given this complex nature of insurance cards with each service provider who offers similar if not same service has different standards and sometimes different plan offered by one company could have different cards layouts and labels. It becomes complex to just use an OCR capability to get what you want while OCR would be starting point. If you take a step-by-step approach on solving this problem

Step 1: Capture an Image of Card (either real time or upload)
Step 2: Extract the Text and Label out of the Image using OCR
Step 3: Validate the extracted information

We are not going into details of Step 1 and Step 3 here in this blog as they need other capabilities than OCR whereas Step 2 completely depends on OCR capabilities and its quality of output.

OCR Capabilities:

Before we could compare various products and features it's important to define what factors and output formats each offers. Fundamentally OCR is a combination of Image Processing with Pattern Recognition. First step is to get a digital image of the item you what to read then apply algorithm to get the characters out then label the output based on few factors like proximity and relevance.

Output Formats:

Raw (Lines/ Words): Provides a view for Location based Extract
Forms: Scan the document as FORM and provide Label/ Value
Tables: Scan the document as Table and provide in Tabular From
Cards: Business Cards/ Contact Cards

When I think about the problem in hand and try to define metric that can define the effectiveness of OCR on a high level, below list came up

Comparison Metrics:

Accuracy Level (identify character from different fonts)
Relevance (similarity like words on a line)
Proximity (like where they are located)

Comparing AWS Textract vs. Einstein OCR:

Thought for practical use case an API based access to extracted data will be best, for “ease of use” I used the UI tools offered to see processed data.

SFDC: Einstein Playground → OCR

AWS Management Console: Amazon Textract → Analyze document

When it comes to AWS Textract the UI interface is simple where you just need to upload the image and Textract engine does all the work unlike Einstein OCR where you must specify what was the data format that you are feeding like Text, Contact. Thought this might not be important on the onset as you may know well in advance what OCR capability you are building it shows the matureness of AWS Textract’s predictability.

To compare these two products with basic metrics defined above I used sample insurance card in Fig 1 above

Looking at Accuracy Level:

From the ability of accurately predicting the characters both AWS and SFDC scored equal. I want to note here that the scope of testing is limited to English characters set and clearly legible printed characters and fonts.

Looking at Relevance Level:

When it comes to relevance to “group together words” based on Relevance Textract clearly beats OCR. I can go to the extent of saying OCR is not even close to what Textract can do in any of the Einstein OCR methods Text, Contact or Table offered by SFDC.

Fig. 3: AWS Textract Output for Relevance

Fig. 4: Einstein OCR Output for Relevance

Looking at Proximity Level:

Proximity level is measured by the accuracy of predicting the location of the word or line mapped as X and Y coordinates in two dimensions plain. While both AWS Textract and OCR provided accurate coordinates based on the Label it produces AWS Textract beats OCR when combined with Lines and Words output together.

Using the Lines and Words output combo one can easily analysis the placement of text in the layout by applying general principal that Label and Value will be kept in proximity for example one would put the “Member Id” Value and Label close together in the same line or a line below rather than splitting them apart multiple lines. It’s a commonsense based approach for a human reader to understand.

Fig 5: AWS Textract Output for Words view (Refer Figure 3 for AWS Textract Lines view)

Applying a Real Use Case:

Finally, I tried using my insurance card and tried using this service as an average informed user to see how these services perform. To my surprise Einstein OCR ran into a problem and was unable to Label while Textract was able to predict and extract label and values close to actual labels and values.

Fig. 6: AWS Textract: Real Life Insurnace Card

Fig. 7: Einstein OCR: Real Life Insurnace Card (Expired)

Verdict:

When combining all these metrics and weighing features available today AWS Textract came as clear winner.

Final Thoughts:

Thought OCR has features that will help in simplifying the identification of Labels its cannot be relied upon to provide a complete solution in entirety. Other features of advanced analytics like a secondary Regression Machine Learning are required. To compete with other products, Einstein OCR need to improve the way they interpret Relevance and Proximity in their predictive algorithms.