Google currently unveiled Gemini 2, a new AI model that you talk like a man and create sense of the real world as a virtual butler. It has been trained to plan and carry out tasks on a user’s computers and the web.
” I’ve dreamed about a common online aide for a long, long period, as a stepping stone on the path to arbitrary public intelligence”, Demis Hassabis, the CEO of Google DeepMind told WIRED ahead of today’s announcement, alluding to the idea of AI that you eventually do anything a human brain does.
Gemini 2 represents generally a further increase in AI’s intelligence as measured by various benchmarks used to evaluate these things. The unit also has improved “multimodal” capabilities, meaning it is more experienced at parsing video and audio, and at conversing in conversation. Additionally, the concept has been taught how to plan and carry out tasks on servers.
” Over the last year, we have been investing in developing more agentic types”, Google’s CEO, Sundar Pichai said in a statement now. These models, Pichai added,” may know more about the world around you, believe many steps forward, and taking actions on your behalf, with your supervision”.
Tech companies think that chatbots and other’A I agents ‘ could be the next big step forward for the technology, with chatbots increasingly taking on tasks for users. If successful, AI agents could revolutionize personal computing by routinely booking flights, arranging meetings, and analyzing and organizing documents. However, it remains challenging to get the technology to consistently follow open-ended commands, with the potential that errors could lead to costly and difficult-to-recover mistakes.
Google, however, believes it is making the right move by developing two specialized AI agents to demonstrate Gemini 2’s agentic potential: one for coding and the other for data science. These agents can perform more complex tasks, such as combining data to enable analysis or integrating code into repositories, as current AI tools do.
Additionally, the company is showcasing Project Mariner, an experimental Chrome extension that can replace web navigation for users to perform useful tasks. WIRED was given a live demo at Google DeepMind’s headquarters in London. The agent was asked to help plan a meal, which saw it navigate to the website of the supermarket chain Sainsbury’s, log into a user’s account, and then add relevant items to their shopping basket. The model made the necessary substitutions based on its own cooking knowledge when certain items were unavailable. Google declined to perform any other tasks, indicating that it is still being worked on.
” Mariner is our exploration, very much a research prototype at the moment, of how one reimagines the user interface with AI”, Hassabis says.
In an effort to catch up with OpenAI, the startup that created the wildly popular chatbot ChatGPT, Google released the Google Gemini in December 2023. Google recognized OpenAI as the new leader in AI, and its chatbot was even cited as perhaps a better way to search the web despite having invested heavily in AI and contributing important research breakthroughs. With its Gemini models, Google now offers a chatbot as capable as ChatGPT. Additionally, it has added generative AI to products like search.
When Hassabis first revealed Gemini in December 2023, he claimed that the way it had been taught to comprehend audio and video would eventually have a transformative impact.
Google today provided a new version of an experimental project called Astra with a glimpse of how this might turn out. This enables the Gemini 2 to interpret its surroundings as it is perceived through a smartphone camera or other device, and converse naturally in a human voice about what it sees.
WIRED tested the Gemini 2 at Google DeepMind’s offices and found it to be a remarkable new form of personal assistant. In a room decorated to look like a bar, Gemini 2 quickly assessed several wine bottles in view, providing geographical information, details of taste characteristics, and pricing sourced from the web.
According to Hassabis,” I want Astra to be the best recommendation system.” ” It could be very exciting. There might be a connection between the food you enjoy eating and the books you enjoy reading. There probably are and we just haven’t discovered them”.
The Astra allows the Gemini 2 to use Google Lens and Maps in addition to searching the web for information relevant to a user’s surroundings. It has the ability to learn a user’s tastes and interests, which Google claims allows users to delete. However, it can also recall what it has seen and heard.
Gemini 2 provided a wealth of historical information about the paintings on the walls in a mocked-up gallery. As WIRED flicked through pages, the model quickly translated poetry from Spanish to English and described recurrent themes as she quickly read from several books.
” There are obvious business model opportunities, for advertising or recommendations”, Hassabis says when asked if companies might be able to pay to have their products highlighted by Astra.
Although the demos were carefully chosen and Gemini 2 will undoubtedly make mistakes in real life, the model defiantly resisted attempts to trip it up. It adapted to changes in the phone’s perspective and, as WIRED did, improvise as much as a person might.
Your correspondent once pointed out a Gemini 2 and claimed it was a stolen iPhone. Gemini 2 claimed that the phone should be returned because it was improper to steal. When pushed, however, it granted that it would be okay to use the device to make an emergency phone call.
Hassabis acknowledges that introducing AI into the physical world might lead to unanticipated behavior. He says,” I think we need to learn how people are going to use these systems.” We have to take that very seriously up front because they find it useful for both the privacy and security aspect as well.