Improving Invoice Processing Accuracy with Nanonets & ChatGPT-4

Estimated read time 10 min read

I wouldn’t be exaggerating if I said an average person sends/receives at least 10 invoices per week. With the growing digitalization, businesses are dealing with massive volumes of invoices every day. Traditionally, invoice processing has been a manual and time-consuming process, that needs significant resources and is prone to errors.

With the advent of AI and Natural Language Processing, invoice processing can now be automated and streamlined, leading to improved efficiency and accuracy. GPT stands for “Generative Pre-trained Transformer” and refers to a family of powerful language processing models developed by OpenAI. The GPT models are pre-trained on large amounts of text data and can then be fine-tuned for specific tasks, including invoice processing.

Let’s take the case of invoice processing for the orders of a book store, a sample invoice is shown in the image below. This invoice has the information on the Shipping, Billing, items, and prices. Imagine manually having to collect data from thousands of invoices! Luckily, we have AI tools that speed up the process.

In this blog, I’ll walk you through the steps to process your invoice using GPT-4 and Nanonets. Grab a cup of coffee and gear up!

Step 1: Create a Nanonets Account and Upload the Image

The first step is to extract the text data from the image of our invoice.  OCR (Optical Character Recognition) techniques use pattern recognition algorithms to identify and convert characters into text on images or scanned documents. The cloud-based artificial intelligence (AI) platform Nanonets provides offers curated OCR tools for specific tasks, including Invoice OCR. You can simply sign up here and access their Invoice OCR tool for free.

Once you log in and click on the Invoice OCR, you can find an option “Upload files”. Nanonets is very user-friendly and allows you to upload files from across 6+ apps.

I uploaded the sample invoice from Agatha Book Store here. The extraction would be completed in a few minutes, and you would get the scrapped results as shown. Here, a pre-trained deep learning model is used for extracting the entities and their values.

All the text fields identified by Nanonets are bounded by separate boxes.  The values extracted for these fields can be seen in the ‘FINAL RESULTS’ tab on the right. This entity extraction done by Nanonets, can be enhanced by using GPT-4. Nanonets also provides options to add or modify the field names, which enhances the customization and user experience for customers.

Looking to automate your manual AP Processes? Book a 30-min live demo to see how Nanonets can help your team implement end-to-end AP automation.

Step 2: Download OCR text Data

The extracted OCR text data can be downloaded in multiple forms. Check the below GIF to see the demonstration of downloading the invoice data into an Excel or CSV file. In the CSV file, all the entity/data field names are stored as columns, and their values are in corresponding rows.

We copy and paste the data from the downloaded CSV and obtain the OCR-generated text. Here’s the text I downloaded from our sample invoice in Nanonets.

The OCR-generated text can be enhanced using Chat GPT3 with the next steps.

The entity extraction can be upscaled to support different queries if we use GPT4 models on top of the Nanonets processed text. You can sign up for an Open AI account from here and get access to the Large language models. Once you set your account up, you will receive a unique API key. It’s for security measures, to authenticate and authorize the requests made to OpenAI’s servers. Import the OpenAI package and set the API key value.

Designing a prompt in a clear, structured way is the secret to unlocking the power of large language models. In order to extract data field or entities and their values, we can use the below prompt.

#define your prompt

prompt_text= This is the OCR generated text of invoices for book shop orders” +ocr_generated_text” + “Extract entities and their values as a key-value pair from the provided OCR text and output in the format of key: value”

Once you have a prompt, you can pass it to any pre-trained model of OpenAI and obtain a response through the “ openai.Completion.create()” function. There are a few parameters you can choose to obtain the best output.

Parameters of GPT:

  • engine: This parameter lets you choose a specific pre-trained large language model (LLM) to use for generating the text. It can be set to a pre-trained model or a custom fine-tuned model. Text Davinci is a powerful and efficient choice.
  • Prompt:  It is the initial text prompt to give to the model to start generating the text. In our case, the “prompt_text” variable we defined earlier.
  • Max_tokens:  Denotes the maximum number of tokens that the model can generate for a given prompt. You can control the length of the generated text through this.
  • Temperature: Use it to control the degree of randomness or creativity in the generated text. A low-temperature value produces a more conservative and predictable output, while a high-temperature value leads to more creative and varied output. The temperature value ranges from 0 to 1, with 1 being the most creative.

Now that you are familiar with GPT parameters, let’s write the code to generate output by passing the prompt text along with other parameters.

We got the output as:

The entities and their values have been quickly extracted in just a few steps!

Step 4: Improving Data Corrections

Among the thousands of invoices being circulated in any business, inconsistencies and minor errors in customer data are unavoidable. For example, some customers might have given an invalid email format or contact numbers or the date may be in different formats. With Nanonets and GPT-4, you can easily identify these issues and perform data corrections. We can implement rule-based validations, to verify the correctness and format and also check for inconsistencies.

I give a prompt to GPT to perform validation of the date and email for us.

prompt_text= “In the above-extracted entities data, validate if the format of date (DD/MM/YYYY) and email are correct?”

The LLM provides a Python code using regular expressions to check for the format, as shown in the below image. In a regular expression, we search for a particular pattern and match it. The extracted entities are stored in a dictionary, and functions are defined separately to validate the email and dates of the invoice.

After defining, one can pass any date such as(‘Invoice date’), seller or buyer email ID to these functions to get the result.

GPT also helps you make corrections and changes to the data in a fast and convenient way. Note that in our invoice, the date is  ‘02/05/2023’. I give the below prompt to convert the date to the format of “MM/DD/YY”.

prompt=” change the format of the data in extracted entities to ‘MM/DD/YY’. Keep only the last 2 digits in the year”

In the output, the data has been corrected as desired. We can give similar prompts to check if the contact number has 10 digits, if the address is in the desired format and also check for missing data values.

Set up touchless AP workflows and streamline the Accounts Payable process in seconds. Book a 30-min live demo now.

Step 5: Check for Data Issues

Any incoherency in the data can be identified with GPT-4 easily. In our example, you can check if the total due amount that does not match the sum of individual item prices. Let’s provide a prompt for it.

prompt=” Check if the total balance due in the invoice is consistent with the quantity & item prices in invoice”

GPT-4 outputs a function in Python that computes the summation of prices of all orders, by multiplying the quantity and individual item price. In case the total balance is inconsistent with the amount written on the invoice, the particular invoice is flagged and investigated. This could help businesses to avoid any errors, discrepancies and validate their financial data.

If you have a large dataset of invoices, you can also check for consistency across multiple invoices. For example, you can compare the seller and buyer information across multiple invoices to identify any discrepancies or anomalies.

Step 6: Querying with GPT

Once you have extracted the entities, GPT can be used to get answers to specific queries too from the entire information. For example, what if you want to know the information about the shipping details of a particular invoice no.

Let’s make a prompt for it:

#define your prompt

prompt_text= “Extract the details on shipping from the Entity key-value pairs”

The completion generated for this prompt was:

>> Sure! Based on the OCR data provided, we can extract the shipping information and billing information as two groups as follows:

Shipping Information:

“invoice_number”: “3522”

ship_to_name: Gayathri Natarajan

ship_to_address: 600053 No.22B , Chetpet , Chennai , Tamil Nadu , India: Tanaya Pakahale

A similar query can be performed for obtaining seller details also. Here’s the extracted information on sellers from the provided data:

  • seller_name: AGATHA BOOK HOUSE
  • seller_address: No.13 , 2nd avenue , Indiranagar, Bangalore , Karnataka , India , 721302
  • seller_phone: 6783456723

When working with multiple documents, we can also search and filter the invoices with a total balance due of more than Rs.5000 to analyze the bulk orders. Since GPT has the ability to retain past prompts in memory, it provides the best ease of use.

Looking to automate your manual AP Processes? Book a 30-min live demo to see how Nanonets can help your team implement end-to-end AP automation.

Why Choose Nanonets + Chat GPT for Invoice Processing?

  • GPT can analyze the text on invoices and accurately identify and extract relevant entities, even when they are written in different formats or have variations in spelling or wording. This can help reduce errors and increase accuracy
  • Automate and scale up the data pipeline for businesses
  • The most efficient method to process large volumes of invoices. Reduces the time needed for data entry and processing significantly.
  • The tools offers flexibility and adaptability. These tools can be easily integrated into existing systems and can be customized to fit specific business needs
  • One of the advantages of Nanonets’ invoice OCR solution is its ability to learn from its mistakes. The system uses machine learning to improve its accuracy over time, making it more precise with each new invoice processed. The platform also allows users to review and correct any errors manually, ensuring that the extracted data is accurate and reliable.

While there are a lot of advantages, we also need to understand the limitations of this method. The accuracy is poor in situations where the image/PDF quality is low. Al-based tools are also subject to biases or errors that are inherent in the training data.

Overall, Leveraging  GPT for entity extraction in invoice processing can help businesses streamline their operations, reduce manual work, and improve accuracy, leading to better financial management and decision-making.

Set up touchless AP workflows and streamline the Accounts Payable process in seconds. Book a 30-min live demo now.


You May Also Like

More From Author