Wednesday, October 2, 2024

Microsoft gives Copilot a voice and vision in its biggest redesign yet

Must read

Microsoft is unveiling a big overhaul of its Copilot experience today, adding voice and vision capabilities to transform it into a more personalized AI assistant. As I exclusively revealed in my Notepad newsletter last week, Copilot’s new capabilities include a virtual news presenter mode to read you the headlines, the ability for Copilot to see what you’re looking at, and a voice feature that lets you talk to Copilot in a natural way, much like OpenAI’s Advanced Voice Mode.

Copilot is being redesigned across mobile, web, and the dedicated Windows app into a user experience that’s more card-based and looks very similar to the work Inflection AI has done with its Pi personalized AI assistant. Microsoft hired a bunch of folks from Inflection AI earlier this year, including Google DeepMind cofounder Mustafa Suleyman, who is now CEO of Microsoft AI. This is Suleyman’s first big change to Copilot since taking over the consumer side of the AI assistant.

“At Microsoft AI, we are creating an AI companion for everyone,” says Suleyman in an open letter today. “I truly believe we can create a calmer, more helpful and supportive era of technology, quite unlike anything we’ve seen before.”

The redesigned Copilot experience on the web.
Image: Microsoft

Copilot now looks unlike anything I’ve seen from Microsoft before, with an interface that is a big departure from what exists right now. It’s a lot warmer, with a personalized Copilot Discover page that’s more useful and inviting than a text entry prompt for a chatbot. Microsoft is customizing this entire Copilot homepage based on your conversation history, and over time, it will include useful searches, tips, and relevant information.

Microsoft split off its consumer version of Copilot to Suleyman’s team earlier this year, and it’s clearly allowed the company to experiment more with personality and customization. What we’ve learned from the Pi team and the [Inflection AI] folks that came over is that they’ve always had an attention to detail on the needs of customers,” says Yusuf Mehdi, executive vice president and consumer chief marketing officer at Microsoft, in an interview with The Verge. “The way they listen and what they’ve learned from these long conversations in that research has certainly influenced what we’ve done here.”

The new Copilot experience on mobile.
Image: Microsoft

Beyond the look and feel of this new Copilot, Microsoft is also ramping up its work on its vision of an AI companion for everyone by adding voice capabilities that are very similar to what OpenAI has introduced in ChatGPT. You can now chat with the AI assistant, ask it questions, and interrupt it like you would during a conversation with a friend or colleague. Copilot now has four voice options to pick from, and you’re encouraged to pick one when you first use this updated Copilot experience.

“We’re making a huge bet on voice,” says Mehdi. “When you use it with the way we’ve designed it, you really start to let yourself go and have conversations. Then you see the glimmers of where we’re going to go long term, with vision where the AI can actually help you and see what you see if you want it to.”

Copilot Vision is Microsoft’s second big bet with this redesign, allowing the AI assistant to see what you see on a webpage you’re viewing. You can ask it questions about the text, images, and content you’re viewing, and combined with the new Copilot Voice features, it will respond in a natural way. You could use this feature while you’re shopping on the web to find product recommendations, allowing Copilot to help you find different options.

Copilot Vision sessions are opt-in and ephemeral, and Microsoft says none of the content Copilot Vision engages with is stored or used for training. This new experience won’t work on all websites yet because Microsoft has put restrictions on the types of websites Copilot Vision works with. “We’re starting with a limited list of popular websites to help ensure it’s a safe experience for everyone,” says the Copilot team. During preview, Copilot Vision won’t work on paywalled and sensitive content, either.

Despite the disclaimers, Microsoft clearly has a long-term vision for these new voice and vision features in Copilot. One demo shows Copilot Vision being used to look at photos of old handwritten recipes, helping to explain what the food is and offering tips on how long it takes to make the recipe. Microsoft demonstrated a similar assistive experience for Xbox games earlier this year, showing how Copilot could help you navigate through Minecraft.

This next phase of Copilot also includes Copilot Daily, an audio summary of news and weather that Copilot reads out as if it were a CNN anchor. It’s designed as a short clip you can listen to in the mornings, and it only uses content from news and weather providers that have authorized Copilot to use their content. Microsoft is working with Reuters, Axel Springer, Hearst, and the Financial Times initially, with plans to add more sources over time.

Copilot can also handle more complex questions thanks to OpenAI’s latest models. Think Deeper is a new feature in Copilot that lets the assistant take more time to respond, allowing it to supply step-by-step answers to complex questions. It’s designed to work best when you’re trying to compare two options side by side, like, for example, “Should I move to New York or San Francisco?”

Think Deeper is still early in development, and Microsoft is placing it into Copilot Labs, a new way to test out experimental features that the company is still developing. Copilot Vision will also be part of the Labs feature initially, and participants will be able to provide feedback on the experiences. Microsoft is clearly treading carefully with Copilot Vision after the backlash around its initial Recall security and privacy issues. Microsoft revealed last week that Recall has been overhauled with improved security and privacy options, and you’ll even be able to fully uninstall the feature or not turn it on in the first place.

This updated Copilot experience will be available today in the mobile iOS and Android apps, on the web at copilot.microsoft.com, and through the Copilot Windows app. Copilot Voice will be initially available in English in Australia, Canada, New Zealand, the UK, and the US, before expanding to more regions and languages in the future. Copilot Daily is limited to the US and the UK before it expands elsewhere, and Copilot Vision will be limited to a number of Copilot Pro subscribers in the US.

If, like me, you’re wondering where Copilot heads next, Microsoft’s new AI CEO has some grand ideas. “Over time it’ll adapt to your mannerisms and develop capabilities built around your preferences and needs. We are not creating a static tool so much as establishing a dynamic, emergent, and evolving interaction,” says Suleyman. “It’ll accompany you to that doctor’s appointment, taking notes and following up at the right time. It’ll share the load of planning and preparing for your child’s birthday party. And it’ll be there at the end of the day to help you think through a tricky life decision.”

Latest article