3 Lessons from AI That Worked but Was Never Used

And what I do differently now

Back in 2024, I helped build an AI prototype for a company from the finance industry.

The problem: a significant share of their work consisted of manually reviewing incoming proposals every week and figuring out if it makes sense pursuing these "opportunities". We're talking 50+ page PDFs here packed with dense information, tables, and graphics. The initial scans were mainly done by senior people with salaries beyond the 6-figure range.

Our solution was pretty straightforward: an AI system that would automatically generate key information, present it in a consistent format, flag any obvious red flags and potentially save hours of precious work time.

We built the system. It worked. And it never made it to production. A total project budget of $20K went to waste.

Today's article is about reflecting on what happened back then and how I’m doing things differently today.

1) It had to be perfect

This phrase came up in every review meeting.

The prototype did a solid job on most documents. It could reliably extract the key info and find obvious red flags (keep in mind this was still GPT-4). But every review surfaced another edge case. Another thing it didn't handle quite right.

I remember one specifically: a document where a country breakdown was indicated by flag icons instead of country names. We didn’t have good multi-modal models yet so we relied completely on text processing. Parsing images was still in its infancy – so those sections came out completely garbled.

Fair enough. Edge case. We'll fix it.

But then there was another edge case. And another. Every fix revealed two more issues. The goalposts kept moving.

Which was the real problem: the goalpost was never defined. We didn’t have a clear acceptance criteria like “if this summary could at least give us the information X, Y, and Z from 9/10 documents, we ship it.”

Instead, we just operated under a vague sense that it needed to be better before anyone would trust it.

So it was never done.

Here's what I do differently now: define "good enough" before you write a single line of code (or prompt). What's the minimum it needs to do to be useful? What's your fallback when it fails?

The fastest way out of the 80% Fallacy is to just accept the 80% for now. Mostly, it’s easier to redesign the workflow in a way it can benefit from the 80% instead of trying to push the AI to 100% and make it fit into the current workflow perfectly.

For example, instead of trying to replace the full review, the AI could have served as a first-pass filter. Flag the obvious "no's" automatically – and provide reasons for it (wrong sector, too small, missing key data, etc.). This would have been easy to verify and save a lot of time because it filters out low-value, time-draining part effectively.

In other words: AI doesn't need to be perfect. It needs to be good enough that the cost of handling exceptions is lower than the cost of doing everything manually.

Chase 100% and you'll never ship. Accept 80% with a human fallback and you might actually capture value.

Too often, when you want 100% you get 0%.

2) The pain wasn't painful enough

Yes, summarizing documents manually was tedious. Yes, senior people were doing junior work. Yes, everyone agreed it was a problem worth solving.

But it was also something people actually liked to do. The work was getting done, and nobody was losing sleep over it.

When I evaluate AI opportunities now, I use what I call the 10K rule: if the problem doesn't clear a minimum value threshold – say $10K per month in costs, or 1,000 hours per quarter – the opportunity isn't painful enough to prioritize.

Back then, I didn't do that math. We built a solution for a problem that sounded annoying but wasn't actually expensive. As soon as priorities shifted – and they quickly did – there was no compelling financial reason to push the AI project forward.

It was easy to deprioritize because nobody could say: "This is costing us $50K a year. Every month we delay, we're leaving $4K on the table."

Without that number, AI projects become nice-to-haves. And nice-to-haves don't survive the next budget meeting.

3) We built a castle in the sky

The prototype lived in a demo environment. A simple, yet generic interface we'd built specifically for the project (vibe-coding wasn’t a real thing yet). Still, the prototype looked good enough in presentations.

But nobody could picture how it would actually land in their workflow.

  • Where would it live?

  • How would documents get into the system?

  • Who would check the outputs?

  • What happens when it's wrong?

What I realized is that if you build AI in isolation from the actual workflow, you're building a magic trick, not a tool. Users can't evaluate something they can't imagine using. They'll nod along in demos and then never log in.

Now, understanding the workflow integration comes before building anything. Where does the input come from? Where does the output need to go? What systems are already in place? The best AI solutions don't create new workflows, but seamlessly slot into existing ones.

Our prototype required people to open a separate application, upload documents manually, wait for processing, then figure out what to do with the results. Too much friction.

In hindsight, a simple Outlook plugin – or even an AI workflow that pre-reads the email and inserts a few bullets directly into the message – would have been 10x more effective. And easier to grow into something bigger over time.

Takeaways

Moving prototypes into production is rarely an AI or tech problem.

It’s typically a problem between "this works" and "this is how we work".

Crossing that gap requires you to figure out a few things before you even write the first prompt:

  • Can this solution live without 100% perfect results? What would be "good enough" to be useful? Define that success criteria.

  • Did we validate if the problem is really worth solving? A prototype is just the beginning of a longer (and generally more expensive) AI journey.

  • How would this actually affect people’s workflows and are they ready for it? When in doubt, change the workflow as little as possible, just make cheaper, better, faster, or more scalable.

Looking back, the prototype did exactly what it was supposed to do. What we didn't set up were the conditions for it to actually ship.

This project shaped how I approach every AI engagement now. Questions like the ones above aren't afterthoughts.

They're where we start.

See you next Saturday!
Tobias

Reply

or to participate.