blog post

OpenAI and Agent Software

OpenAI’s first major product, ChatGPT, proved so popular that it sparked a generation of wannabes. But as rivals like Google catch up, OpenAI is hustling to release a product that could prove almost as revolutionary.

OpenAI is developing a form of agent software to automate complex tasks by effectively taking over a customer’s device. The customer could then ask the ChatGPT agent to transfer data from a document to a spreadsheet for analysis, for instance, or to automatically fill out expense reports and enter them in accounting software. Those kinds of requests would trigger the agent to perform the clicks, cursor movements, text typing and other actions humans take as they work with different apps, according to a person with knowledge of the effort.

Too Hot

The software in the works is one of two types of agents OpenAI is developing as it jumps into one of the hottest areas of artificial intelligence, which could soon include Google and Meta Platforms. Some longtime AI researchers have been leaving companies such as Google to start companies that develop agents.

OpenAI will have to temper fears about agents that access users’ computers, given that such a takeover may remind some people of malware, which is known to illicitly seize control of computers and steal their data.

Using The Web – Just Like We Do

OpenAI, a leader in generative AI that investors recently valued at $86 billion, is developing another class of AI agent that would handle web-based tasks such as gathering public data about a set of companies, creating itineraries under a certain budget or booking flight tickets, said a person with knowledge of that effort. Google and Meta have said they are developing similar types of agents, powered by LLMs.

Both forms of agents OpenAI is developing could help CEO Sam Altman turn ChatGPT into what he has privately called a “super smart personal assistant for work.” They may also bring his company into more-direct competition with Microsoft, which is also using OpenAI’s LLMs to automate features of its enterprise apps so they can help people quickly create new documents or draft email responses.

Wait … What?

LLMs underpin productivity chatbots such as ChatGPT, and some at OpenAI also view them as having the potential to be a kind of operating system, including for personal devices, because of their ability to write code, make sense of images and retrieve files. Agents could further buttress this potential.

OpenAI’s plans for the agent that takes over people’s computers will require the user’s permission to work. To operate in a personalized fashion and respond quickly the way Apple’s Siri does on the iPhone, the prospective OpenAI computer-using agent may need to be partly stored on users’ devices.

The company would also need to get permission from users to train the software on personal data, such as an individual’s emails and contacts, as well as information stored in business apps like Word and Google Docs.

If your privacy spiders are lighting up. They should be.

In contrast, people today access ChatGPT through a website or mobile app, and all of its computations happen in the cloud—specifically through Microsoft’s Azure servers.

Zero to One

It isn’t clear when OpenAI plans to release its agent products, which have been in development for more than a year, but some of its employees have hinted at their importance.

Last month, an OpenAI employee who has worked on computer-using agents at his startup, according to a person familiar with his role, posted on X that he was hiring for his team and “building what (I think) could be an industry-defining zero to one product that leverages the latest and greatest from our upcoming models.” He didn’t elaborate.

One of OpenAI’s vice presidents of product, remarked on X that the product he described “will change everything.”

The Pot is Starting to Boil

OpenAI has a good reason to expand ChatGPT’s capabilities quickly. Google is soon expected to unveil the flagship version of its most advanced LLM, Gemini, and launch a paid version to compete with the paid version of ChatGPT.

 OpenAI, after leapfrogging Google’s technology by releasing ChatGPT in late 2022, this year could face the possibility that it won’t have the strongest LLM in the market. Agents and other new features could help make up for that.

ChatGPT is a key part of OpenAI’s fast-growing revenue stream and will help the company raise tens of billions of dollars as it seeks to develop AI that can handle most human labor.

Agents Everywhere

Clues about OpenAI’s agent aspirations have appeared in recent product announcements.

During its event for customers in November, OpenAI launched its Assistants API, which allows developers to build agent-like experiences within their applications that can generate graphs, keep track of conversation histories and retrieve information from outside documents.

Though this tool allows developers to connect their agents to the internet or take actions within specified external applications, it won’t give agents full control or understanding of a user’s computer.

Yet.

The agents taking over people’s computers require more technology than LLM-powered conversational AI. That’s because LLMs have a propensity to make up false information, otherwise known as “hallucinations,” which can be disastrous for workers. And while chatbots like ChatGPT can connect to other applications that provide application programming interfaces to take agent-like actions, many enterprise apps such as Google Slides lack APIs. Computer-using agents can fill that gap.

Smart LLMs

Folks are realizing that LLMs aren’t that useful by themselves within an enterprise setting, and with that, the realization that a company like Adept mi                       

The two-year-old Adept, which has raised $415 million from investors including Greylock Partners and Nvidia, has created AI models that understand the text, images and webpages that appear on a person’s computer as they work, in real-time.

The models are based on transformers, which infer relationships between data and also power conversational AI. Adept’s models generate actions that computers can take rather than producing text the way conversational chatbots do.

It’s All Happening at the Zoo

Adept and OpenAI have trained their computer-using agent models on examples of humans using computers, including how they work on different document types like charts and PDFs.

Computer-using agents differ from firms that sell robotic process automation (RPA), which handle menial tasks for workers. RPA software typically requires developers to manually code the steps needed to complete a task, while AI agents can handle more complex, unstructured tasks that require humanlike judgments with little guidance from users. But still not sentience, right?

Technologists have been discussing agents for years.

Six years ago, Google demonstrated an agent-type software that could, for instance, make automated voice calls to local businesses to schedule appointments on behalf of a user. It’s first on-stage preview was very different than something from Apple and Pichai was embarrassed.

Fear of Transparency

But just as Google hesitated to launch a ChatGPT-type chatbot years ahead of when OpenAI did, Google also never launched the phone-call agent, in part due to fears of a public backlash against the technology, and also didn’t launch their vaunted LLM because of discovered biases, over which humans could do little. Which prompted the two founders of Character.ai to leave start their avatar for company line of business.

Pichai recently suggested such automation was coming because the latest technology “allows us to act more like an agent over time…and maybe go beyond answers and follow through for users even more.”

The Big Dude

Microsoft, whose $13 billion investment in OpenAI gives it free reign to use the startup’s technology, has taken baby steps toward agent-style features on devices that run the company’s Windows operating system.

Windows 11 Copilot, for instance, can carry out a number of tasks on people’s computers, such as turning down the volume, showing which windows they’re running or moving files into different folders. For now, the feature is far from autonomous; it presents users with a yes-or-no choice before carrying out their commands, and it can’t handle complex, multistep actions across different apps. But “now “ is fleeting and next week could be something entirely new.

In fact, Microsoft’s research arm recently published open-source software intended to make it easier for people to build autonomous AI agents that can carry out multistep processes such as debugging code or moderating online forums.

If this doesn’t frighten you, it should.

Author

Steve King

Managing Director, CyberEd

King, an experienced cybersecurity professional, has served in senior leadership roles in technology development for the past 20 years. He began his career as a software engineer at IBM, served Memorex and Health Application Systems as CIO and became the West Coast managing partner of MarchFIRST, Inc. overseeing significant client projects. He subsequently founded Endymion Systems, a digital agency and network infrastructure company and took them to $50m in revenue before being acquired by Soluziona SA. Throughout his career, Steve has held leadership positions in startups, such as VIT, SeeCommerce and Netswitch Technology Management, contributing to their growth and success in roles ranging from CMO and CRO to CTO and CEO.

Get In Touch!

Leave your details and we will get back to you.