What Happens If You Put Computers In Front Of Computers

Why Anthropic's new computer use could be a new paradigm for AI interactions.

Hi there,

I don't usually dive into tech updates from the AI world in this newsletter (I would never stop writing if I did). But this time feels a little different.

Anthropic recently released a new feature – if you can even call it that. It definitely gave me a real "wow" moment. Not just because of what it can do, but because of how it was presented and the potential it holds to change the way how we interact with AI.

Yes, I'm talking about "computer use".

Let's dive in to find out more!

Check out this report from today's sponsor:

Industry-first report covering real-world attacks on GenAI. Based on telemetry data collected during 2024 from over 2,000 LLM applications.

What's Inside: 

  • Key findings on attack patterns 

  • Adversaries' objectives and motivations

  • Real-world attacks, including jailbreak methods and outcomes

  • Technical insights and 2025 forecasts

Computer Use: What It Is and Why It's (Probably) Important

Anthropic describes "computer use" as a way for "developers to build products where Claude 3.5 Sonnet can generate computer actions based entirely on what it sees on the screen". 

In simple terms, it's an AI optimized to receive screenshots along with a prompt, and then return a set of actions – coordinates, mouse clicks, keyboard strokes, etc. – that should be performed on that screen to achieve a desired outcome.

Picture an AI sitting in front of the computer, not inside it.

So Claude can now operate a computer by looking at it to perform tasks like filling out forms or browsing websites.

You could say, well that's nothing new! Theoretically, you can also just upload a screenshot to ChatGPT and ask it where it would click to let's say open a certain app.

The big difference is that ChatGPT can't actually do this by default and you would need to build this functionality out, let's say using OpenAI's function calling capability.

But what Anthropic did was to do the heavy lifting and abstract all the complexities – taking a screenshot, interpreting the prompt, returning structured output in the form of display coordinates, clicks, etc. – behind an easy-to-use API. Ready to plug and play even for less technical people or those not deeply familiar with AI.

And that's the real innovation.

Why This Is a Paradigm Shift

When I first saw "computer use", I was immediately reminded of something I heard from Rodger Werkhoven, a super cool fellow speaker I met recently at a conference in Croatia. Rodger has worked with OpenAI on DALL-E.

Rodger and me after sharing the ride to Zagreb Airport

In essence, Rodger said:

"The modern world we live in today is designed as a giant interface for the human body. Every knob, button, or door handle is designed to work flawlessly with human hands, legs, or senses. This is also one of the reasons why real-world robotics is so hard – we're essentially trying to teach machines to use human interfaces."

Now, the same thing is happening with the digital world and AI.

Anthropic is pushing their AI's capabilities beyond traditional machine-to-machine interfaces (function calling and tool use) to include human-centric graphical user interfaces (GUIs). Now, this concept isn't entirely new — Robotic Process Automation (RPA) has been around for a decade. But previous attempts often struggled with complex interfaces that couldn't be managed by simple if-then-else rules.

Modern AI could revolutionize this approach (or fail miserably, since at the core it's still an LLM, with all things that that AI can and can't do).

Anyway, the fact that Anthropic, as one of the top 3 AI labs, is releasing something like this to the public is a huge deal for democratized AI automation. Literally anyone, regardless of technical expertise, could now build powerful AI agents in plain English.

The main question though is: does it really work?

Well, there's no better way to find out than to try it for yourself!

How to Get Started with Anthropic's "Computer Use"

Even though it's still in early beta and primarily intended for developers, Anthropic has made it surprisingly easy to get started using computer use yourself. Follow these steps, and you'll have an AI interacting with a virtual computer at your command in no time!

Here's what you'll need:

  • A Mac or Linux computer (should work on Windows too, but I haven't tried).

  • Docker installed on your computer (it's free and easy to set up).

  • Access to the Anthropic API (requires a billed account).

Once you have those, let's get started:

Step 1: Install Docker

Docker is a tool that lets you run apps in containers (think of virtual mini computers) without needing to install everything manually. Here's how to get it:

  1. Download Docker from their official website.

  2. Choose the version that matches your system (Mac or Linux).

  3. Follow the on-screen instructions to complete the installation.

  4. Once installed, open Docker and make sure it's running.

Tip: You'll know Docker is running when you see the Docker icon in your menu bar (top right on Mac, or bottom right on Linux).

Step 2: Download the Project Files

Next, you'll need the project files from Anthropic.

2. Download the ZIP file by clicking on the "Code" button and selecting "Download ZIP" (if you have Git, you can also just clone the repo).

3. Unpack the ZIP file into a folder on your computer and open this folder. It should look something like this:

Great! We'll come back to this folder later.

Step 3: Set Up Your API Key

Now that you've downloaded the files, you need to get an API key from Anthropic.

  1. Log into your Anthropic account and navigate to the API keys section.

  2. Click "Create Key" and copy it.

  3. Open the folder where you unpacked the files.

  4. Create an empty text file called launch.txt in this folder.

  5. Paste your API key into this file and save it.

Note 1: Normally, your API key should be stored securely (like in a vault or environment variable), but for this demo, we'll keep it simple and store it here. Just remember to delete it after we're done.

Note 2: Since computer use will make multiple API calls per minute, the free tier likely won't suffice (unless you're just sending one screenshot). If you haven't already, go to the Billing section and top up your pre-paid credit. $5 is more than enough for this demo.

Step 4: Open the Terminal

Since you've already installed Docker, you now need to launch the container that Anthropic provided.

For this, you'll need the Terminal. If you're not familiar with it, don't panic! Just follow these steps:

  1. Open the folder where you unpacked the project files.

  2. Make sure you can see the Path Bar (go to View → Show Path Bar if it's not visible).

  3. Right-click on the computer-use-demo folder in the Path Bar and select "Open in Terminal"

If a new window pops up with a blinking cursor, awesome! You're almost there.

Step 5: Start the AI Sandbox

1. Open the launch.txt file in a text editor. It should contain your API key.

2. Paste the following code below the key:

export ANTHROPIC_API_KEY=%your_api_key%
docker run \
    -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
    -v $HOME/.anthropic:/home/computeruse/.anthropic \
    -p 5900:5900 \
    -p 8501:8501 \
    -p 6080:6080 \
    -p 8080:8080 \
    -it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

3. Now cut your API key and replace %your_api_key% with it. Your final launch.txt should look like this:

4. Save the file for future use.

5. Drag and drop the launch.txt file into the Terminal window that you opened in Step 4, then hit Enter.

You should see some downloads happening. If you run this the first time, this will take a couple minutes. Eventually, you should see a line that reads:

  • ➡ Open http://localhost:8080 in your browser to begin.

Congratulations! You're all set!

Step 6: Have Fun with Your AI

Once the sandbox is up and running, open your web browser and enter localhost:8080 into the URL bar.

Type "Hi" or something simple into the chat to make sure Claude is responding.

From here, you can start letting your AI interact with the little virtual computer you've just launched. It comes with a web browser, PDF reader, calculator, spreadsheet tool, and maybe even more. Try it out! For example, you could type: Open the calculator, calculate 50×50, and then put the result into a new spreadsheet – as I did in this video:

Watch as the AI operates (and fails) just like a human! Remember, this is an early beta version, and you're among the very first to try it out. Fun things you can do:

  • Let the AI fill out a PDF document you downloaded somewhere

  • Let the AI copy data from a website into a spreadsheet

  • Let the AI write some code in the Terminal to schedule a job

What's your impression?

Tip: Use the Toggle Screen Control button in the top right corner to take over if Claude gets stuck.

Wrapping up

To quit this application, just terminate the process by closing the Terminal window and Delete the Docker Container (Docker Desktop —> Dashboard —> Delete). To restart, just drag & drop launch.txt again into a Terminal window you launched from the computer use folder.

Conclusion

Congratulations! If you've followed these steps, you now have Anthropic's "Computer Use" running securely in a virtual machine on your computer.

Safe to say you're on the cutting edge of AI development, experimenting with technology that allows machines to interact with computers the way we do.

While it's still early days, this is a glimpse of a future where AI takes over routine tasks we'd rather not do ourselves. You don't want to build this into production yet, but let's revisit where this technology will be 6 months from now.

Remember, this is just the beginning. Soon, you'll be able to integrate this into your workflow and take your productivity to new levels.

Keep experimenting and stay tuned for more updates!

See you next Friday,
Tobias

Reply

or to participate.