Wednesday, November 20, 2024

ChatGPT’s Advanced Voice Mode could get vision capabilities soon

Must read

When OpenAI first introduced ChatGPT, the AI was already pretty impressive with its ability to hold conversations, answer questions, and even help with a variety of tasks. But OpenAI has never stopped innovating.

During the launch of GPT-4o in May 2024, the company promised vision capabilities. This is supposed to take ChatGPT beyond just text and voice, allowing it to understand and interact with the real world through live video.

After months of anticipation, it appears the feature is finally ready to roll out to more users.

The new “Live Camera” feature, which is expected to arrive soon in a beta version, will allow ChatGPT to “see” and engage with your surroundings in real-time. Users will be able to activate the feature by tapping a camera icon within the app, which will trigger the AI to view and comment on whatever it sees through the device’s camera.

As announced during the GPT4o rollout, the feature is built on the Advanced Voice Mode, which lets ChatGPT have natural, flowing conversations. The addition of vision capabilities means that ChatGPT can now recognize objects and people, remember names, and even make associations between items in the environment.

OpenAI’s ChatGPT Dominates AI Market, Hits 200 Million Weekly Users

As noted by the AI company, a key driver of this growth is the introduction of GPT-4o Mini – a more affordable version of OpenAI’s flagship model

For example, in a demo during the GPT-4o event, ChatGPT was shown identifying a dog, recalling its name, recognizing a ball, and understanding the game of fetch—all without the user needing to provide specific input beyond the initial setup.

Now, thanks to new code strings spotted in the latest beta update by Android Authority, it looks like OpenAI is preparing for a wider rollout. While the feature could be exciting for many, OpenAI made sure to add a cautionary warning, stating users are advised not to rely on the “Live Camera” for important decisions, like navigation or anything that could impact health or safety.

While it’s still in the beta stage, we expect the “Live Camera” feature to soon be available to ChatGPT Plus subscribers and possibly other paid tiers. If this ends up happening, it would give it a step up over competitors like Google’s Gemini, whose closest offering to this is Google Lens which doesn’t offer live capabilities like ChatGPT is claiming.

ChatGPT has about 10 million paying users which is nothing compared to Google’s 100 million Google One subscribers that have access to Gemini Advanced. With more additions like this, ChatGPT could give many users valid reasons to jump ship from whatever chatbot they currently serve to this.

Louis Eriakha profile image


Updated

Latest article