Demis Hassabis has never been shy about proclaiming big leaps in artificial intelligence. Most notably, he became famous in 2016 after a bot called AlphaGo taught itself to play the complex and subtle board game Go with superhuman skill and ingenuity.
Today, Hassabis says his team at Google has made a bigger step forward—for him, the company, and hopefully the wider field of AI. Gemini, the AI model announced by Google today, he says, opens up an untrodden path in AI that could lead to major new breakthroughs.
“As a neuroscientist as well as a computer scientist, I’ve wanted for years to try and create a kind of new generation of AI models that are inspired by the way we interact and understand the world, through all our senses,” Hassabis told WIRED ahead of the announcement today. Gemini is “a big step towards that kind of model,” he says. Google describes Gemini as “multimodal” because it can process information in the form of text, audio, images, and video.
An initial version of Gemini will be available through Google’s chatbot Bard from today. The company says the most powerful version of the model, Gemini Ultra, will be released next year and outperforms GPT-4, the model behind ChatGPT, on several common benchmarks. Videos released by Google show Gemini solving tasks that involve complex reasoning, and also examples of the model combining information from text images, audio, and video.
“Until now, most models have sort of approximated multimodality by training separate modules and then stitching them together,” Hassabis says, in what appeared to be a veiled reference to OpenAI’s technology. “That’s OK for some tasks, but you can’t have this sort of deep complex reasoning in multimodal space.”
OpenAI launched an upgrade to ChatGPT in September that gave the chatbot the ability to take images and audio as input in addition to text. OpenAI has not disclosed technical details about how GPT-4 does this or the technical basis of its multimodal capabilities.
Google has developed and launched Gemini with striking speed compared to previous AI projects at the company, driven by recent concern about the threat that developments from OpenAI and others could pose to Google’s future.
At the end of 2022, Google was seen as the AI leader among large tech companies, with ranks of AI researchers making major contributions to the field. CEO Sundar Pichai had declared his strategy for the company as being “AI first,” and Google had successfully added AI to many of its products, from search to smartphones.
Soon after ChatGPT was launched by OpenAI, a quirky startup with fewer than 800 staff, Google was no longer seen as first in AI. ChatGPT’s ability to answer all manner of questions with cleverness that could seem superhuman raised the prospect of Google’s prized search engine being unseated—especially when Microsoft, an investor in OpenAI, pushed the underlying technology into its own Bing search engine .
Stunned into action, Google hustled to launch Bard, a competitor to ChatGPT, revamped its search engine, and rushed out a new model, PaLM 2, to compete with the one behind ChatGPT. Hassabis was promoted from leading the London-based AI lab created when Google acquired his startup DeepMind to heading a new AI division combining that team with Google’s primary AI research group, Google Brain. In May, at Google’s developer conference, I/O, Pichai announced that it was training a new, more powerful successor to PaLM called Gemini. He didn’t say so at the time, but the project was named to mark the twinning of Google’s two major AI labs, and in a nod to NASA’s Project Gemini, which paved the way to the Apollo moon landings.