Monday, December 23, 2024

OopsGPT

Must read

Whenever AI companies present a vision for the role of artificial intelligence in the future of searching the internet, they tend to underscore the same points: instantaneous summaries of relevant information; ready-made lists tailored to a searcher’s needs. They tend not to point out that generative-AI models are prone to providing incorrect, and at times fully made-up, information—and yet it keeps happening. Early this afternoon, OpenAI, the maker of ChatGPT, announced a prototype AI tool that can search the web and answer questions, fittingly called SearchGPT. The launch is designed to hint at how AI will transform the ways in which people navigate the internet—except that, before users have had a chance to test the new program, it already appears error prone.

In a prerecorded demonstration video accompanying the announcement, a mock user types music festivals in boone north carolina in august into the SearchGPT interface. The tool then pulls up a list of festivals that it states are taking place in Boone this August, the first being An Appalachian Summer Festival, which according to the tool is hosting a series of arts events from July 29 to August 16 of this year. Someone in Boone hoping to buy tickets to one of those concerts, however, would run into trouble. In fact, the festival started on June 29 and will have its final concert on July 27. Instead, July 29–August 16 are the dates for which the festival’s box office will be officially closed. (I confirmed these dates with the festival’s box office.)

Other results to the festival query that appear in the demo—a short video of about 30 seconds—seem to be correct. (The chatbot does list one festival that takes place in Asheville, which is a two-hour drive away from Boone.) Kayla Wood, a spokesperson for OpenAI, told me, “This is an initial prototype, and we’ll keep improving it.” SearchGPT is not yet publicly available, but as of today anybody can join a waitlist to try the tool, from which thousands of initial test users will be approved. OpenAI said in its announcement that search responses will include in-line citations and that users can open a sidebar to view links to external sources. The long-term goal is to then incorporate search features into ChatGPT, the company’s flagship AI product.

On its own, the festival mix-up is minor. Sure, it’s embarrassing for a company that claims to be building superintelligence, but it would be innocuous if it were an anomaly in an otherwise proven product. AI-powered search, however, is anything but. The demo is reminiscent of any other number of AI self-owns that have happened in recent years. Within days of OpenAI’s launch of ChatGPT, which kicked off the generative-AI boom in November 2022, the chatbot spewed sexist and racist bile. In February of 2023, Google Bard, the search giant’s answer to ChatGPT, made an error in its debut that plummeted the company’s shares by as much as 9 percent that day. More than a year later, when Google rolled out AI-generated answers to the search bar, the model told people that eating rocks is healthy and that Barack Obama is Muslim.

Herein lies one of the biggest problems with tech companies’ prophecies about an AI change: Chatbots are supposed to revolutionize first the internet and then the physical world. For now they can’t properly copy-paste from a music festival’s website.

Searching the internet should be one of the most obvious, and profound, uses of generative-AI models like ChatGPT. These programs are designed to synthesize large amounts of information into fluent text, meaning that in a search bar, they might be able to provide succinct answers to simple and complex queries alike. And chatbots do show glimmers of remarkable capabilities—at least theoretically. Search engines are one of the key ways people learn and answer questions in the internet age, and the ad revenue they bring is also lucrative. In turn, companies including Google, Microsoft, Perplexity, and others have all rushed to bring AI to search. This may be in part because AI companies don’t yet have a business model for the products they’re trying to build, and search is an easy target. OpenAI is, if anything, late to the game.

Despite the excitement around searchbots, seemingly every time a company tries to make an AI-based search engine, it stumbles. At their core, these language models work by predicting what word is most likely to follow in a sentence. They don’t really understand what they are writing the way you or I do—when August is on the calendar, where North Carolina is on a map. In turn, their predictions are frequently flawed, producing answers that contain “hallucinations,” meaning false information. This is not a wrinkle to iron out, but woven into the fabric of how these prediction-based models function.

Meanwhile, these models raise a number of concerns about the very nature of the web and everyone who depends on it. One of the biggest fears is from the websites and publishers that AI tools such as SearchGPT and Google AI Overviews are pulling from: If an AI model can read and summarize your website, people will have less incentive to visit the original source of information, lowering traffic and thus lowering revenue. OpenAI has partnered with several media publishers, including The Atlantic—deals that some in journalism have justified by claiming that OpenAI will drive traffic to external sites, instead of taking it away. But so far, models from OpenAI and elsewhere have proved terrible at providing sources: They routinely pull up the wrong links, cite news aggregators over original reporting, and misattribute information. AI companies say the products will improve, but for now, all the public can do is trust them. (The editorial division of The Atlantic operates independently from the business division, which announced its corporate partnership with OpenAI in May. In its announcement of SearchGPT, OpenAI quotes The Atlantic’s CEO, Nick Thompson, speaking approvingly about OpenAI’s entry into search.)

This is really the core dynamic of the AI boom: A tech company releases a dazzling product, and the public finds errors. The company claims to incorporate that feedback into the next dazzling product, which upon its release a few months later exhibits similar flaws. The cycle repeats. At some point, awe will need to give way to evidence.

Latest article