Use Case Breakdown: Augmenting a Reporting Workflow from PDF Files with AI

How to use Azure Form Recognizer and Open AI Services to unlock unstructured data at scale

Hi there,

Today, I'll walk you through an example of how we can use AI to get insights from a batch of PDF documents in an automated way. You will learn about some key tools you can use to do this and a reference architecture you can follow.

Unlocking unstructured data is critical because a lot of time is spent (or should I say wasted?) in companies trying to find, process, and wrangle information - too often a manual, time-consuming process that's error-prone and doesn't scale well.

If you're ready to learn how to do this with AI, then dive in!

Problem Statment

To make this example as concrete as possible, imagine the following scenario:

An international NGO receives numerous project reports each month with updates on project progress and budgets from various partners in different countries.

Right now, the task of reading these PDFs, extracting data, and creating a combined report takes two persons a whole month.As the NGO expands, the number of reports will grow, making the process increasingly time-consuming and costly.

So, we need a solution – a quicker, cheaper, and simpler way to handle these reports.

Let's dive in and explore how to do this.

Solution Architecture

Our high-level solution involves three interconnected layers:

  1. Data Layer: This is our information source – a system of folders containing the monthly PDF project reports. We'll use Microsoft's Power Query in combination with an R or Python script as our data pump.

  2. Analysis Layer: Responsible for processing the data, this layer uses Microsoft Azure Form Recognizer and Azure's OpenAI services to extract and analyze the data from the PDFs.

  3. User Layer: The interface for our team to interact with the information. Initially, we'll use Excel for its simplicity, but we've designed our solution for a seamless switch to a more powerful tool like Power BI if needed.

This systematic approach will make report processing faster, cheaper, and easier while providing room for future upgrades.

Breaking Down the Solution

Let's discuss our solution layer by layer:

Data Layer

The Data Layer is where we initiate the solution, focusing on loading PDF files using Power Query.

To automate data extraction, we use either Python or R to read the PDFs one by one. Each file is then sent to the Analysis Layer for processing.

Here's a simplified view of the process:

  1. A Python or R script within PowerQuery reads a PDF file from a specified location.

  2. The file is sent to an API service in the Analysis Layer to extract desired information.

  3. The retrieved information is structured into a standardized dataframe.

  4. The dataframe is outputted to Power Query as a table, ready for further manipulation.

PowerQuery will then bring this structured data to the user layer (for example, Excel or Power BI).

As an example, the final table outputted to Power Query might look like the following:

Analysis Layer

At the analysis layer, we use a computer vision AI service like Microsoft Azure Form Recognizer and a large language model like GPT-4 via Azure OpenAI services to extract data from the PDFs. Both services come with pay-as-you-go pricing.

Azure Form Recognizer: This tool analyzes a given document page by page to identify and extract key-value pairs and table data. The output is a structured JSON file, a universal data format that's easy to work with.

Here's a simplified version of what a sample JSON output from Azure Form Recognizer might look like:

{
    "page1": {
        "tables": [
            {
                "rows": 5,
                "columns": 3,
                "data": [
                    ["Budget", "Used", "Remaining"],
                    ["$70,000", "$50,000", "$20,000"],
                    ...
                ]
            }
        ],
        "fields": {
            "Report Date": "2023-06-01",
            "Project End Date": "2023-09-01",
            ...
        }
    },
    ...
}

However, if we need information that's not readily extractable by the Form Recognizer (like details hidden in a paragraph), that's when we bring in the Azure OpenAI service.

Azure OpenAI service: We feed the JSON output of Form Recognizer to an AI model like GPT-4 and ask it to extract specific information. We can instruct GPT-4 to output JSON as well, which would look something like this:

{
    "data_extraction": [
        {
            "field": "Budget Remaining",
            "value": "$20,000"
        },
        {
            "field": "Projected End Date",
            "value": "2023-10-01"
        }
    ]
}

Finally, we merge the information from both JSON files into one final JSON, which is then used to create a standardized data frame. This process is repeated for each document in the data layer, culminating in a final dataframe that contains all the desired fields from all the documents.

All of these steps can take place within the R or Python script running in PowerQuery.

User Layer

At the User Layer, we use the pre-processed table from Power Query, which can be easily loaded into Excel.

Here, we design a simple dashboard to display the collected information from all the project reports. This could include things like:

  • KPI Summary Section

  • Project Overview Section

  • Trend Analysis Section

  • Geographical Overview

While Excel would be perfectly capable of displaying this dashboard, if the organization needs more advanced features, they could easily switch to PowerBI.

Both Excel and PowerBI can work with Power Query as the data pump, allowing for a seamless transition if needed.

Here’s the full architecture again:

Benefits

Overall, this solution has the chance to offer multiple benefits:

  1. Efficiency: Save time when gathering the information.

  2. Scalability: This system can process as many PDFs as needed.

  3. Flexibility: You can easily swap different components for more advanced (or simpler) choices. For example, we could simply integrate our own AI services if needed.

  4. Multiple Languages: The AI services can handle multiple languages out of the box, which alone could be a huge benefit.

Considerations and Constraints

Let's take off our rose-colored glasses for a moment. What difficulties can we expect?

  1. Data Preprocessing: Before PDFs are fed into our AI services, it's important to ensure that the documents are in a suitable state. Simple data sanity checks for document size, page count, or page orientation should be the bare minimum!

  2. Error Handling: There's always a possibility that the AI services might fail to parse a document due to unexpected issues. A robust error handling mechanism should be in place to catch these instances and log them for further analysis and troubleshooting.

  3. Augmentation vs. Automation: If we find that our PDFs are too diverse for the AI services to handle consistently, we may need to adjust our approach. Instead of aiming for a fully automated system, we might opt for an augmented system that keeps humans in the loop. This could involve developing a web tool that allows users to manually upload some PDF files and get auto-fill for the required fields, allowing the user to make adjustments as needed. While not fully automated, this approach could still save a significant amount of time and reduce the risk of data entry errors.

  4. Continuous Improvement: The AI models used in this solution will need to be tweaked based on feedback and errors. This iterative process will ensure the solution remains effective and accurate over time.

That's why it's always good to start with a small prototype to better understand not only the data, but also the needs and wants of your users.

Conclusion

I hope you have seen how AI can help to unlock unstructured data sources and help streamline operations and drive insights.

Of course, this solution is not limited to NGOs - any organization dealing with complex reporting can adapt it to streamline their own processes.

While there are some challenges ahead (most notably data quality and data readiness), there are many opportunities where AI, coupled with the right tools, has incredible potential to transform data analysis, making it faster, cheaper, and easier.

That said, if you'd like to discuss building a prototype for your own organization, don't hesitate to book a coffee chat with me (use coupon code FREEFLOW to waive the fees).

Thanks for reading today's edition!

Feel free to reach out if you have any questions, or share the newsletter with your colleagues if you enjoyed it.

See you next Friday, and happy data analyzing!

Best,

Tobias

Further Resources:

Want to learn more? Here are 3 ways I could help:

  1. Read my book: Improve your AI/ML skills and apply them to real-world use cases with AI-Powered Business Intelligence (O'Reilly).

  2. Book a meeting: Let's get to know each other in a coffee chat.

  3. Follow me: I regularly share free content on LinkedIn and Twitter.

Reply

or to participate.