GPT-o1 Preview: Breaking Down OpenAI's New Model

And what it means for your business

Hi there,

I've got to be honest - this past week's been a bit rough on the sleep front. You know how OpenAI likes to drop big news unexpectedly, preferably late evenings CET? Well, that's exactly what happened.

Just as I was getting into the flow of working on a new project (more on that soon), they went ahead and released a major update: GPT4o got a new cousin called o1. And as expected, it sent the whole AI community into a frenzy (again).

For every new OpenAI release, it's always fun (sort of) to watch the speculation flywheel on social media. For some folks, GPT-5 has arrived (spoiler alert: it hasn't). Others dismiss it as "more snake oil".

My take: ignore the hype and give it a try. Because a preview version of GPT-o1 is available now to all paid ChatGPT users.

Before we dive into what it means for your business and why it's so important, let's break down what GPT-o1 really is.

Let's go!

What is GPT-o1 (a.k.a Strawberry)?

A while back, Sam Altman dropped a cryptic tweet that sent the AI world into a swirl of speculation. In his typical understated style, he posted a random picture with the caption, "I love someone in the garden"–which was seen 6.7 million times. Cue the rumors.

Was this GPT-5 in disguise? A completely new breakthrough model? Or just another incremental upgrade?

As it turns out, we now have our answer: GPT-o1. It's not GPT-5, but it’s certainly not just a minor tweak either. Think of it as a different version of GPT4o, capable of handling more complex tasks with greater accuracy, but at the price of much higher cost (and slower speed).

What's most exciting is that OpenAI has already released a preview version of GPT-o1 to all ChatGPT Plus users, meaning you can start playing today if you have a subscription.

So, what's the deal with "Strawberry"? That's a fun little story. Back in October 2023, OpenAI was working on a new model leveraging a "Q*" (pronounced Q-star) algorithm, which quickly got nicknamed Strawberry. Since then, the Strawberry theme has taken off. One of the first tests OpenAI demoed with GPT-o1 was to see if the model could count the number of "Rs" in the word "strawberry". This was something that previous models struggled with because language models generate tokens one at a time and can't "see" the full word until it's fully generated.

GPT-o1 cracked this case–successfully counting the Rs–marking a new level of reasoning capability for the model.But it's not just about counting letters. With GPT-o1, the model's ability to perform math, logic, and other complex tasks has taken a giant leap forward.

As with any AI release, though, the real question isn't just what it can do on paper—it's what it can do for you. That's what we'll discuss next.

Why This Model Matters

So, why should we pay attention to GPT-o1? Simply put, it introduces a new paradigm for how AI models can get better over time, unlocking new frontiers in what AI is capable of.

To understand this, we need to revisit the concept of scaling in AI. Traditionally, the race to build more capable language models has been about making them bigger—adding more parameters, training on larger datasets, and increasing computational power. For years, this approach has worked. Models like GPT-4, with its estimated 1 trillion parameters and massive training power, outperform smaller, specialized models across a range of tasks. Larger models have so far been the key to breakthroughs in AI performance.

A Paradigm Shift in Scaling: Introducing "Thinking" Time

However, GPT-o1 introduces a new paradigm for scaling—one that doesn’t just rely on bigger models or more data, but rather focuses on scaling during inference. This is where things get interesting. Instead of only improving performance by making the model bigger, OpenAI has found that allowing the model to spend more time "thinking" during inference can lead to more accurate and reliable responses.

What does "thinking" mean in the context of AI? It refers to the process of performing multiple internal reasoning steps before generating an output. The model takes more time to process and refine its response. This is akin to prompting the model to follow a chain of thought, breaking down a problem into steps—first gather data, then analyze it, then make a decision—before producing the final answer. Specifically, o1 has been trained in a special way that generates (hidden from the user) thinking tokens before arriving at a final answer, giving it the ability to perform more complex internal calculations before responding.

The surprising finding here is that the longer a model "thinks", the better its answers get. Right now, o1-preview typically spends between 5-20 seconds solving a given problem. But OpenAI thinks bigger: they envision future models that could “think” for hours, days, or even weeks until they come up with an answer.

This post-training scaling law mirrors the traditional scaling law we've seen with larger models, but instead of adding more parameters, it increases the time the model spends reasoning.

Implications for Performance

This new approach offers exciting possibilities. For tasks that require deep reasoning—like legal analysis, financial auditing, or complex problem-solving—GPT-o1 can spend more time processing and produce far more accurate outputs than previous models that prioritized speed.

While this approach holds a lot of promise, it also introduces new tradeoffs. Longer inference times mean higher costs and slower responses. Concretely, a single request to o1 can easily be up to 100x more expensive and take 10x as long as sending something to GPT-4o!

To be clear: this isn’t a model you want to have a casual chat with. In fact, I believe most chatbot use cases won’t work well with o1.

Moreover, this "reasoning" doesn’t necessarily lead to better performance across the board. Recent benchmarks have shown that o1-preview sometimes performs worse than GPT-4o in areas like code completion (see below) or creative writing (based on my anecdotal experience).

As I'll be covering more in-depth in my upcoming workshop, this model still suffers from the same limitations as current LLMs, and it isn't here to be "better" than GPT-4o. It's different and designed for specific use cases.

Use Cases: So What?

Before you get too excited—everyone is still figuring things out. There isn't a clear pattern yet on which common use cases GPT-o1 will dramatically outperform GPT-4o, especially with Chain of Thought Prompting. It's early days, and we're still working with the preview version.

That said, I discovered four promising areas have already emerged for me.

Disclaimer: All of these use cases can also work under existing LLMs, especially with the right prompting techniques. However, with GPT-o1, I expect these tasks to generally perform better and, crucially, without the need for applying specific prompting strategies. GPT-o1’s enhanced reasoning capabilities make it likely a more robust solution out of the box for these types of problems:

1. Data Validation Tasks

As mentioned in this LinkedIn post, data validation is a core task for many businesses, and GPT-o1’s ability to reason through inconsistencies makes it well-suited for this. In scenarios where data looks correct at a surface level—such as an email address that passes all automatic checks—it can still catch subtle errors that traditional models might miss. In the example below, GPT-o1 was able to detect an inconsistent purchase date because, according to its internal knowledge, the product wasn't even released at that time.

2. Strategic Planning & Organizational Tasks

Another promising area are use cases that require strategic planning or step-wise, holistic organization. Like for example building MECE issue trees—a structured framework for breaking down complex problems into smaller, manageable components, which I also cover in my ChatGPT for Data Analytics workshops. For traditional LLMs, you'd have to create these trees step by step, branch by branch, because the model wouldn’t be able to generate new nodes and sub-nodes while simultaneously checking that the splits fulfill the MECE criteria (no branches overlap, and all branches together cover the whole solution space). GPT-o1 can create these pretty much in one go, without much prompting.

If you don’t know what I mean, just compare this chat output (GPT4o) vs. this chat output (GPT-o1). This is extremely valuable for consulting, project management, and overall problem-solving, where the need for precise structure and sound logic is key.

Another promising area is legal, or complex document analysis. Recently, I flew from Hanover to Washington with a stop in Amsterdam, all on the same airline, for business. I wondered: If my first flight is delayed and I miss my connecting flight, can I cancel the trip, get a refund, and book a different airline? Because if I can't get to my final destination on time, the whole trip would be pointless. GPT-o1 could reason through the various regulations (e.g., EU261/2004 flight compensation rules) and determine my eligibility based on the specific conditions of the delay and my itinerary.

This kind of step-by-step legal reasoning is something that requires deeper analysis, which is exactly where GPT-o1 shines. I can see many business use cases developing around this capability.

4. Financial Audits and Compliance Analysis

Another area where models like GPT-o1 will make a real difference, is in corporate auditing. Take the Wirecard scandal as an example. For years, auditors failed to notice that over €1.9 billion was missing from its balance sheet—oopsie!

In the end, it was a human who did the grunt work to uncover the truth, crawling through tons of paperwork and going back and forth between countless dead ends. What took months of painstaking research could have been done much faster with models like GPT-o1. I guess that's what they call forensics, and in the financial world there's a whole industry devoted to it. Instead of just doing surface level checks or building complex (more or less reliable) agentic systems of different LLMs, GPT-o1 could be the door opener to doing automated compliance checks in real time to uncover inconsistencies in deep, multi-layered issues.

In any case, if you have a use case similar to the ones mentioned above and it didn't work with previous models—go give it another try with GPT-o1.

Conclusion

So, GPT-o1 is different—not always better—but it has the potential to unlock a whole new frontier of AI capabilities and use cases that demand deeper reasoning.

The best way to figure out how this can impact your work?

Learn some essential basics and then get hands-on! You'll never know what you can uncover until you try. Shameless plug: my upcoming workshop “Understanding How LLMs Work (Without Being an AI Engineer)” is a great way to start. And yes, we’ll explore o1 in more detail.

Start exploring today!

See you next Friday,

Tobias

Reply

or to participate.