Making ChatGPT Friends with Numbers

So you can trust its results and work *really* faster

Hi there,

I recently saw a data scientist getting absolutely frustrated with ChatGPT. He asked it to sum up a column of numbers, and the result was completely wrong. After trying again, it messed up the whole table. What the hell?

You've probably been there too. Maybe you tried using ChatGPT for some quick calculations, only to get questionable results. Or perhaps you've read somewhere that AI struggles with math (*aehm*) and decided to avoid anything number-related altogether – especially when it confidently tells you the wrong result (and then politely argues why this must be correct).

The problem isn't that Large Language Models can't work with numbers – it's that we're asking them to do it the wrong way.

Today, we'll explore the right strategies!

What Problem Do You Have With Math, ChatGPT?

To understand why LLMs struggle with math – and more importantly, how to fix it – we need to peek under the hood for a moment.

For an LLM like GPT-4o, a number is just another token, no different from any other word. The model sees "10" the same way it sees "car" or "blue" – just another piece of text to predict. There's no inherent mathematical meaning attached to it. (Check my LLM fundamentals workshop if you want to dive deeper on this).

This is why asking an LLM to do calculations directly is like asking someone to write a story that happens to contain numbers. They might get it right occasionally, but they're essentially guessing based on patterns they've seen before, not actually doing math.

Think about it: When you see "2 + 2 = ", your brain automatically switches to "math mode" and performs the calculation. But an LLM doesn't have a "math mode" built-in. It's just predicting what text typically follows "2 + 2 = " based on its training data.

This might sound like bad news, but it actually points us toward the solution: We need to be explicit about how we want LLMs to handle numbers.

When you're using LLMs through chat interfaces like ChatGPT, you might notice they sometimes automatically switch to "math mode" (like running Code Interpreter) when they detect mathematical tasks. While convenient, this automatic switching can be tricky sometimes and mask what's really happening under the hood.

If you're calling these models through their APIs – or want to really understand what's happening in the web interface – you need to be explicit about how the model should handle numbers. There are three main approaches to this, and understanding them will help you work more effectively with numbers regardless of how you're accessing the model.

Let me show you these three strategies and when to use each one.

Strategy #1: Code Generation

The most reliable way to have LLMs work with numbers is to let them write code that performs the calculations, rather than doing the math themselves. While this might sound obvious to some of you, I still see many users trying to have ChatGPT calculate things directly in a conversation.

Let's look at a practical example.

Say you have a table with two columns: 'revenue' and 'cost' that is directly displayed as text in ChatGPT like this:

Now you want to calculate a new column "Profit".

A bad (ambiguous) way to do this would be to ask ChatGPT something like this:

❌ "Add a column profit to the table"

In many cases, this would just add the values based on text generation / prediction and you would need to double check each entry. You can see this by the absence of the code icon at the end of the output (more on this next).

A better (more explicit) way would be to have it write the code to do this:

✅ "Import the table to Python and calculate a profit column"

You can verify that the calculation was performed with code by checking the code icon ("View analysis") and then expanding and reviewing the code that was written.

This code gives you certainty that every value was actually calculated correctly.

# Calculating the Profit column
sales_df["Profit ($)"] = sales_df["Revenue ($)"] - sales_df["Cost ($)"]

Again, for simple tasks, web interfaces like ChatGPT will automatically default to writing code (and running it for you).

But if you're using the API, you would need to call a model that supports code execution like OpenAI's Assistants API. You can verify in the API response that the completion was generated using code interpreter:

Solving numeric problems by having LLMs write code isn't just more reliable – it's also way more scalable and easier to document or debug.

Strategy #2: Tool Use

Another effective way to handle numbers is to let LLMs delegate the actual calculations to specialized tools. Think of the LLM as a smart coordinator rather than a calculator.

A perfect example is the Wolfram Alpha GPT. Here's how it works:

  1. The LLM takes your query and formats it for the Wolfram Alpha API

  2. Wolfram Alpha performs the calculation

  3. The LLM interprets and presents the results

Let's say we want to convert a currency based on the recent exchange rate.

If you ask ChatGPT something like:

❌ "Convert 100 USD to EUR based on current rates"

then you will get various results if you run this query multiple times - depending on which source ChatGPT found to pull the data from.

A better way would be to keep the source of the currency conversion fixed. In our case this could be the Wolfram Alpha GPT (but it could literally be any service available via an API):

✅ "Convert 100 USD to EUR based on current rates" (run in the Wolfram GPT)

Even if you run this query multiple times you'll see consistent outputs because the answer is grounded in a single source of truth (Wolfram Alpha API) and the LLM only facilitates the interaction with it. That's how tool use speeds up the process of getting precise results, eliminating the need to verify multiple answers.

If you want to build custom applications like this, then function calling (OpenAI) or tool use (Anthropic) are your friends.

Strategy #3: Rankings

Sometimes, you don't actually need precise calculations - you just want to know what's highest, lowest, or most frequent. This is where LLMs can actually shine, if you frame the task correctly.

Let's look at another example. Say you have a list of colors mentioned in customer reviews, and you want to know which ones appear most often.

A poor way to get to the most frequent color would be to ask ChatGPT:

❌ "Count how many times each color appears in this text"

You cannot trust this result. It's based on text completion and you would need to double check.

But in many cases, you don't need exact counts. You just need the top (most frequent) elements of something.

In that case you can just go ahead ask right away

✅ "What is the most frequently mentioned color in this table?"

You can trust this result (that blue is the most frequent color, not the count 4). Why?

In the first case, you're asking the LLM to count (which it's bad at). In the second case, you're asking it to recognize patterns (which it's good at). The LLM can spot the pattern that "Blue" appears most often.

If you need exact counts, you would need either to write some code (strategy 1) or call a "color counter" service (strategy 2).

This ranking strategy works particularly well for tasks like sentiment analysis, topic extraction, or any other scenario where relative importance matters more than exact numbers.

Conclusion

LLMs like ChatGPT might not be natural-born mathematicians, but when used correctly, they can be powerful allies in handling numerical tasks. The key is to understand their limitations and work with them, not against them.

Remember: the goal isn’t to force LLMs to "do math" directly, but to leverage their capabilities in ways that make sense.

With the strategies above, you can turn ChatGPT into a reliable data analysis tool – one you can trust for accurate results, delivered faster. That means more time for the tasks that truly matter (or are just more enjoyable).

Now that you understand the fundamentals of how LLMs can work effectively with numbers, you’re ready to take your data analysis to the next level.

If you want to 10x your spreadsheet productivity using ChatGPT, join me for my upcoming workshop on Nov 6, where we'll put these principles into practice and unlock powerful new ways to crunch your numbers.

See you next Friday!
Tobias

Reply

or to participate.