With computer and smartphone manufacturers like Samsung spreading conceptual AI across all elements of their products, OpenAI is trying the same with an agentic application announced on Jan. 23. The application, called Operator, runs on the same basic technologies as ChatGPT but stays within a specialized online website. This enables it to freely do activities such as ordering foods or booking trips.
OpenAI suggested in a blog post Operator could “ope[n ] up new engagement opportunities for businesses, ” but did not elaborate.
What is OpenAI’s Operator?
Operator is an app that includes a web browser and the relational AI type GPT-4o. It’s the result of an OpenAI job to teach GPT-4o’s vision features on the graphical user interfaces found on normal web sites. Its ability to make multi-step strategies and proper errors independelty if needed cast it apart from other efforts to create agentic AI, OpenAI boasted. Operator’s Computer-Using Agent ( CUA) model is trained specifically on the buttons, forms, and menus likely to be found on a web page.
Operator is in alpha. OpenAI said comments from early-stage people will be used to enhance it.
ChatGPT Pro clients you sign up for Operator starting today.
OpenAI plans to offer Operator to Plus, Team, and Enterprise immediately. The software giant also intends to integrate its functions into ChatGPT frequently. They’ll include the CUA in their API “soon, ” according to the blog post.
How does Operator job?
The company says the CUA’s logic strategy, which they call an “inner speech, ” helps the unit understand intermediate steps and react to unexpected type. Under the hood, CUA takes pictures of website pages and uses a digital mouse and keyboard to explore.
As with ChatGPT, people can add personalized recommendations that Controller will consider, such as the user’s preferred airline.
Notice: Hazard actors can hack conceptual AI to quickly generate phishing emails and other malicious content.
People may prompt Controller in normal speech the same way they can prompt ChatGPT. Operator is trained to laugh at logging in to webpages, providing payment information, or passing CAPTCHAs, so it will hands handle up to the person for those steps. Operator is programmed not to accept requests — such as making banking transactions — or to weigh in on high-stakes situations, such as deciding whether to hire an employee.
If the Operator encounters an interface it can’t predict how to interact with, it will hand the task back to the user. OpenAI collaborated directly with the following companies to make sure Operator can interact with their sites:
- DoorDash.
- Instacart.
- OpenTable.
- Priceline.
- StubHub.
- Thumbtack.
- Uber.
OpenAI notes that the early iteration of Operator tends to struggle with “complex interfaces, ” including creating slideshows or adding items to calendars.
Operator enters into a crowded generative AI landscape
Some of Operator’s functionality overlaps with competitor tools, such as Google Gemini or Apple Intelligence.
Operator invites comparison with Microsoft’s much-maligned Recall feature, which uses screenshots to navigate a PC. Operator also shares some capabilities with Google Lens on Chrome. However, its ability to navigate websites autonomously could be a point of differentiation. Agentic AI, in which generative AI models perform multi-step errands on the user’s account, is either the hot new thing in tech or a new way to package the still-limited products.