Use Case Breakdown: AI-Powered E-Mail Triage

Save time by augmenting your email sorting workflow

Hey there,

Today, we're going to tackle an issue that many businesses still spend too much time on: Emails.

While this sounds like a boring task, it's still an unsolved problem for many organizations that waste too much energy trying to figure out which email goes where.

That's where AI can help - and integrating it isn’t actually as complicated as it sounds - especially when we're talking about an augmented approach, where AI works hand in hand with us.

Ready to learn more? Let's go!

You don't even need to train an AI for this

In fact, to use AI to improve email triage or email segmentation, you don't even need to train a custom model on a lot of historical data.

This was something you had to do in the "old" days, before LLMs. You would have to collect a bunch of historical emails, tag them with the correct category, and train a machine learning model.

However, thanks to GPT-3.5 & co. we can dramatically shortcut this process.

Let's look at a concrete example.


Consider a typical workflow that happens in many organizations, from small to large.

Say we have a general purpose inbox such as info@… or support@… which customers can use to easily get in touch with your company.

It's great for the customer (they don't have to look up the specific email address), but it can be a hassle for your organization (especially if you receive a high volume of these).

So the current workflow typically looks something like this:

  1. Email lands in a general-purpose mailbox

  2. Support agent is reviewing the incoming emails

  3. Support agent forwards relevant emails to departments

  4. Support agent answers more general emails themselves

  5. Email gets answered

We could sketch this process as follows:

Let's say the biggest pain point is that support agents have to manually sift through all incoming emails and determine which ones should be forwarded to different internal departments.

Assuming an average reading time of 2 minutes per email, and 100 emails a day, the support agent would spend over 15 hours per week just sorting emails - not answering them.

(And you would typically have a small team of 2-3 people handling this.)

Note: There could be more pain points in this process, such as knowing which person the email needs to be forwarded to, or which email has a high priority. But for now, let's focus on the recurring task of organizing emails into different groups.


How can we improve this process with AI? (Btw that's also something we would answer in an AI Design Sprint).

Consider this AI-infused example:

In this case, step 2 of the process (where the support agent manually reads and sorts the email) would have the following changes

  • An AI would automatically sort incoming emails into different subfolders according to relevant internal departments.

  • As a result, the support agent would not have to read through all the incoming emails chronologically one by one, switching contexts all the time, but would instead go through each subfolder and quickly verify that the emails were sorted correctly.

  • Ideally, they could batch forward all emails in a folder to another department.

This process is what we call AI augmentation. It's not fully automated, but it keeps a human in the loop. As a result, the human would be able to solve these tasks much faster.

What's the benefit?

If we assume, that this approach would speed up the process by 80%, we would save 12 hours per week. On an annual basis - let's assume 45 work weeks - this would bring 540 saved working hours, equivalent to approximately $16,200 in cost savings (considering $30 avg. hourly wage) for the company - per year.

While the savings are notable and add up over time, we need to be careful about ROI.

Looking at a low 5-figure return per year does not justify hiring a data scientist or building something fancy from scratch.

On the other hand, leveraging a Large Language Model as a service like GPT-3.5-Turbo, could seem like a good fit. Let's run some math here:

  • GPT-3.5-Turbo on Azure costs $0.0015 per 1,000 prompt tokens (let's ignore completion tokens for now)

  • Let’s assume the average email length is 300 words

  • Then our average payload per email would be ca. 500 tokens (300 words = ca. 400 tokens + 100 tokens for prompt/completion)

  • Considering 100 emails per day: 100 x 500 x 0,0015 / 1000 = $0,075 / day

Now, this sounds like a promising ROI!

So how can we make this happen? Let's find out.

Technical overview

Let's break our solution up into the three core layers: user, analytics and data.

The user layer is the email client that the support agent uses. In addition to their main inbox, they would also see sub-folders where emails are sorted automatically. These folders can represent different business functions like HR, marketing/PR, legal, etc. These sub-folders must be defined upfront!

The data layer is the incoming emails. We would pre-process them ideally on the email server or in a middle layer between the server and the email client -for example using something like Microsoft Exchange. Here, the workflow connects to the analysis layer.

In the analysis layer, we would have an AI service (for example, a language model) that analyzes the content of the given email and returns a flag for a category, which is passed back to the data layer.

Prototyping the use case

How would we prototype this use case?

Remember, the goal of prototyping is to validate our assumptions.

In the given scenario our most uncertain assumptions are:

  • Does an out-of-the-box AI service perform well enough?

  • Does the pre-categorization actually lead to 80% efficiency gains?

We don't want to challenge other basic assumptions that we’re able to put a label or append something to the subject line of an incoming email, or set up a rule in our mail client that puts these into different subfolders

So let's focus on the LLM performance and the efficiency improvements in the prototyping phase.

Testing the LLM performance

The initial prototype could be as simple as a 20 line Python script. As input, we would need a few emails that we know are relevant to different departments.

The script would read in a batch of emails, call a 3rd-party LLM that complies with the necessary privacy policy.

This could be a GPT-3.5 model hosted via Azure, or a pre-trained open source model like Llama 2 hosted internally.

In any case, the model comes as it is. The first option we should try is to get to the desired output using prompting techniques.

For example, one possible prompt for our scenario could be this one:

Categorize the following email indicated in triple backticks (```) into EXACTLY **one** of the following classes:

- HR
- Marketing/PR
- Legal
- Customer support
- Other

Return ONLY the class name.

Subject: Notice of claim regarding damaged shipment
To whom it may concern,
I am writing with regards to a shipment that was entrusted to your company on March 15th, with an expected delivery date of March 18th. However, when the goods arrived there was noticeable damage to the packaging and contents. [...]



Tuning this prompt would be the core of our prototype. If the result is not accurate enough, we can try to provide examples in the prompt or make the prompt even more specific.

Testing the Efficiency gains

How do we find out if we actually improved something? Let's do a little experiment, using a simple two-phase testing approach:

Baseline Measurement (= establish a benchmark for the current process without AI assistance):

  • Step 1: Select a batch of "fresh" emails.

  • Step 2: Have a support agent manually sort these emails

  • Step 3: Note the time it takes for the entire batch.

AI-Assisted Measurement (= measure the efficiency of the AI-assisted process):

  • Step 1: Take another batch of "fresh" emails and apply the AI pre-categorization process.

  • Step 2: Have the same support agent sort these pre-categorized emails

  • Step 3: Note the time it takes for the entire batch.


  • Compare the times from both phases to determine the percentage efficiency gain.

  • Analyze any discrepancies or anomalies in the AI-assisted process, such as miscategorized emails, to understand potential areas of improvement.

It's essential to use the same support agent for both phases to ensure consistency in the testing process.


As a result, we now have a working prototype that we can use to evaluate the performance of the Language Model in categorizing emails and assess the efficiency gains it brings to the email sorting process.

As a potential next step in the development process, we can consider integrating the prototype into our email system and rolling it out to a few selected users (inboxes) for further testing and feedback.


  • Bear in mind that the accuracy of the email categorization will never be 100%. We're still using a pre-trained model which probably works good enough on the majority of emails (for most languages by the way), but will fail on some edge cases. That's why we need the human in the loop.

  • Since every LLM can only handle a limited amount of text input (context), we would probably only process the first 300 words or so of an email. This is also a great way to cap costs per email.


By leveraging the efficiency of a pre-trained language model combined with the expertise of human agents, we can streamline most email sorting processes, saving both time and money.

Have you tried implementing AI in your business processes? 

Share your experiences, challenges, and successes.

Are there other mundane tasks you wish AI could do for you? 

Hit reply and let's brainstorm together on how AI could be useful to you.

Remember, AI isn't an exclusive tool for tech giants. With the right approach and understanding, any business can benefit.

If you need any help, just reply to this email.

See you next Friday!


Want to learn more?

  1. Book a meeting: Let's find out how I can help you over a coffee chat (use code FREEFLOW to book the call for free).

  2. Read my book: Improve your AI/ML skills and apply them to real-world use cases with AI-Powered Business Intelligence (O'Reilly).

  3. Follow me: I regularly share free content on LinkedIn and Twitter.