Claude can now use a computer

Claude can control an internet connected computer using the mouse and keyboard

Oct 23, 2024

Anthropic has just released a version of the Claude LLM with a capability called “Computer Use” with which it can control a desktop computer by simulating mouse and keyboard actions on that computer. So now, you can hook up Claude to a computer, and give it a high-level task, like “create an Excel spreadsheet with the top 5 RPA companies in the world, basic financial information about them, links to their websites, and a chart of their stock performance for the last 5 years” and it will use the computer’s browser to find the relevant information, download it and create the spreadsheet.

How does it do this? It periodically takes screenshots of the desktop of the controlled computer, sees what’s on the screen, then decides which icon or button on the screen it needs to click, uses a programmatic interface to “click” the “mouse” of that computer, and can similarly use the keyboard to enter text.

Here is an example from Ethan Mollick illustrating the power of this approach. He got Claude to create an entire lesson plan for him:

As one example, I asked the AI to put together a lesson plan on the Great Gatsby for high school students, breaking it into readable chunks and then creating assignments and connections tied to the Common Core learning standard. I also asked it to put this all into a single spreadsheet for me. With a chatbot, I would have needed to direct the AI through each step, using it as a co-intelligence to develop a plan together. This was different. Once given the instructions, the AI went through the steps itself: it downloaded the book, it looked up lesson plans on the web, it opened a spreadsheet application and filled out an initial lesson plan, then it looked up Common Core standards, added revisions to the spreadsheet, and so on for multiple steps. The results are not bad (I checked and did not see obvious errors, but there may be some - more on reliability later int he post). Most importantly, I was presented finished drafts to comment on, not a process to manage. I simply delegated a complex task and walked away from my computer, checking back later to see what it did (the system is quite slow).

Actually, you probably can’t use this capability right now: it is available in beta right now and it needs to be accessed via an API.

So why should you care? Because it gives you an idea of what LLMs will start getting used for very soon. How will the world change? Which tasks of which jobs will be taken over by this?

Ice hockey great Wayne Gretzky is often quoted as saying, “Skate to where the puck is going to be, not where it has been.”

As far as your career is concerned, this is a hint of where the puck is going to be.

AI IQ

Discussion about this post