Monday, December 23, 2024

Google launched Gemini 2.0, its new AI model for practically everything

Must read

Google’s latest AI model has a lot of work to do. Like every other company in the AI race, Google is frantically building AI into practically every product it owns, trying to build products other developers want to use, and racing to set up all the infrastructure to make those things possible without being so expensive it runs the company out of business. Meanwhile, Amazon, Microsoft, Anthropic, and OpenAI are pouring their own billions into pretty much the exact same set of problems. 

That may explain why Demis Hassabis, the CEO of Google DeepMind and the head of all the company’s AI efforts, is so excited about how all-encompassing the new Gemini 2.0 model is. Google is releasing Gemini 2.0 on Wednesday, about 10 months after the company first launched 1.5. It’s still in what Google calls an “experimental preview,” and only one version of the model — the smaller, lower-end 2.0 Flash — is being released. But Hassabis says it’s still a big day.

“Effectively,” Hassabis says, “it’s as good as the current Pro model is. So you can think of it as one whole tier better, for the same cost efficiency and performance efficiency and speed. We’re really happy with that.” And not only is it better at doing the old things Gemini could do but it can also do new things. Gemini 2.0 can now natively generate audio and images, and it brings new multimodal capabilities that Hassabis says lay the groundwork for the next big thing in AI: agents.

Agentic AI, as everyone calls it, refers to AI bots that can actually go off and accomplish things on your behalf. Google has been demoing one, Project Astra, since this spring — it’s a visual system that can identify objects, help you navigate the world, and tell you where you left your glasses. Gemini 2.0 represents a huge improvement for Astra, Hassabis says. 

Google is also launching Project Mariner, an experimental new Chrome extension that can quite literally use your web browser for you. There’s also Jules, an agent specifically for helping developers find and fix bad code, and a new Gemini 2.0-based agent that can look at your screen and help you better play video games. Hassabis calls the game agent “an Easter egg” but also points to it as the sort of thing a truly multimodal, built-in model can do for you.

“We really see 2025 as the true start of the agent-based era,” Hassabis says, “and Gemini 2.0 is the foundation of that.” He’s careful to note that the performance isn’t the only upgrade here; as talk of an industrywide slowdown in model improvements continues, he says Google is still seeing gains as it trains new models, but he’s just as excited about the efficiency and speed improvements. 

Google’s plan for Gemini 2.0 is to use it absolutely everywhere

This won’t shock you, but Google’s plan for Gemini 2.0 is to use it absolutely everywhere. It will power AI Overviews in Google Search, which Google says now reach 1 billion people and which the company says will now be more nuanced and complex thanks to Gemini 2.0. It’ll be in the Gemini bot and app, of course, and will eventually power the AI features in Workspace and elsewhere at Google. Google has worked to bring as many features as possible into the model itself, rather than run a bunch of individual and siloed products, in order to be able to do more with Gemini in more places. The multimodality, the different kinds of outputs, the features — the goal is to get all of it into the foundational Gemini model. “We’re trying to build the most general model possible,” Hassabis says. 

As the agentic era of AI begins, Hassabis says there are both new and old problems to solve. The old ones are eternal, about performance and efficiency and inference cost. The new ones are in many ways unknown. Just to name one: what safety risks will these agents pose out in the world operating of their own accord? Google is taking some precautions with Mariner and Astra, but Hassabis says there’s more research to be done. “We’re going to need new safety solutions,” he says, “like testing in hardened sandboxes. I think that’s going to be quite important for testing agents, rather than out in the wild… they’ll be more useful, but there will also be more risks.”

Gemini 2.0 may be in an experimental stage for now, but you can already use it by choosing the new model in the Gemini web app. (No word yet on when you’ll get to try the non-Flash models.) And early next year, Hassabis says, it’s coming for other Gemini platforms, everything else Google makes, and the whole internet.

Latest article