Sunday, December 22, 2024

ChatGPT gets screensharing and real-time video analysis, rivaling Gemini 2

Must read


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


OpenAI finally added long-awaited video and screen sharing to its advanced voice mode, allowing users to interact with the chatbot in different modalities.

Both capabilities are now available on iOS and Android mobile apps for ChatGPT Teams, Plus and Pro users, and will be rolled out to ChatGPT Enterprise and Edu subscribers in January. However, users in the EU, Switzerland, Iceland, Norway and Liechtenstein won’t be able to access advanced voice mode.

OpenAI first teased the feature in May, when the company unveiled GPT-4o and discussed ChatGPT learning to “watch” a game and explain what’s happening. Advanced voice mode was rolled out to users in September.

Credit: OpenAI

Users can access video via new buttons on the advanced voice mode screen to start a video. 

OpenAI’s video mode feels like a video call like Facetime, because ChatGPT responds in real-time to what users show in the video. It can see what is around the user, identify objects and even remember people who introduce themselves. In an OpenAI demo as part of the company’s “12 Days of Shipmas” event, ChatGPT used the video feature to help brew coffee. ChatGPT saw the coffee paraphernalia, instructed when to put in a filter and critiqued the result. 

It is also very similar to Google’s recently announced Project Astra, in which users can open a video chat, and Gemini 2.0 will respond to questions about what it sees, like identifying a sculpture found in a London street. In many ways, these features are more advanced versions of what AI devices like the Humane Pin and the Rabbit r1 were marketed to do: Have an AI voice assistant respond to questions about what it’s seeing in a video. 

Sharing a screen 

The new screen-sharing feature brings ChatGPT out of the app and into the realm of the browser. 

For screen share, a three-dot menu allows users to navigate out of the ChatGPT app. They can open apps on their phones and ask ChatGPT questions about what it’s seeing. In the demo, OpenAI researchers triggered screen share, then opened the messages app to ask ChatGPT for help responding to a photo sent via text message. 

However, the screen-sharing feature on advanced voice mode bears similarities to recently released features from Microsoft and Google. 

Last week, Microsoft released a preview version of Copilot Vision, which lets Pro subscribers open a Copilot chat while browsing a webpage. Copilot Vision can look at photos on a store’s website or even help play the map guessing game Geoguessr. Google’s Project Astra can also read browsers in the same way. 

Both Google and OpenAI released screen-sharing AI chat features on phones to target the consumer base who may be using ChatGPT or Gemini more on the go. But these types of features could signal a way for enterprises to collaborate more with AI agents, as the agent can see what a person is looking at onscreen. It can be a precursor to models that use computers, like Anthropic’s Computer Use, where the AI model is not only looking at a screen but is actively opening tabs and programs for the user. 

Ho ho ho, ask Santa a question 

In a bid for levity, OpenAI also rolled out “Santa Mode” in advanced voice mode. The new preset voice sounds much like the jolly old man in a red suit.

Unlike the new features restricted to specific users, “Santa Mode” is now available to users with access to advanced voice mode on the mobile app, the web version of ChatGPT and the Windows and MacOS apps until early January. 

Chats with Santa, though, will not be saved in chat history and will not affect ChatGPT’s memory. 

Even OpenAI is feeling the Christmas spirit.

Latest article