Monday, December 23, 2024

Here’s how Apple’s AI model tries to keep your data private

Must read

At WWDC on Monday, Apple revealed Apple Intelligence, a suite of features bringing generative AI tools like rewriting an email draft, summarizing notifications, and creating custom emoji to the iPhone, iPad, and Mac. Apple spent a significant portion of its keynote explaining how useful the tools will be — and an almost equal portion of time assuring customers how private the new AI system keeps your data.

That privacy is possible thanks to a twofold approach to generative AI that Apple started to explain in its keynote and offered more detail on in papers and presentations afterward. They show that Apple Intelligence is built with an on-device philosophy that can do the common AI tasks users want fast, like transcribing calls and organizing their schedules. However, Apple Intelligence can also reach out to cloud servers for more complex AI requests that include sending personal context data — and making sure that both deliver good results while keeping your data private is where Apple focused its efforts.

The big news is that Apple is using its own homemade AI models for Apple Intelligence. Apple notes that it doesn’t train its models with private data or user interactions, which is unique compared to other companies. Apple instead uses both licensed materials and publicly available online data that are scraped by the company’s Applebot web crawler. Publishers must opt out if they don’t want their data ingested by Apple, which sounds similar to policies from Google and OpenAI. Apple also says it omits feeding social security and credit card numbers that are floating online, and ignores “profanity and other low-quality content.”

A big selling point for Apple Intelligence is its deep integration into Apple’s operating systems and apps, as well as how the company optimizes its models for power efficiency and size to fit on iPhones. Keeping AI requests local is key to quelling many privacy concerns, but the tradeoff is using smaller and less capable models on-device.

To make those local models useful, Apple employs fine-tuning, which trains models to make them better at specific tasks like proofreading or summarizing text. The skills are put into the form of “adapters,” which can be laid onto the foundation model and swapped out for the task at hand, similar to applying power-up attributes for your character in a roleplaying game. Similarly, Apple’s diffusion model for Image Playground and Genmoji also uses adapters to get different art styles like illustration or animation (which makes people and pets look like cheap Pixar characters).

Apple says it has optimized its models to speed up the time between sending a prompt and delivering a response, and it uses techniques such as “speculative decoding,” “context pruning,” and “group query attention” to take advantage of Apple Silicon’s Neural Engine. Chip makers have only recently started adding Neural cores (NPU) to the die, which helps relieve CPU and GPU bandwidth when processing machine learning and AI algorithms. It’s part of the reason that only Macs and iPads with M-series chips and only the iPhone 15 Pro and Pro Max support Apple Intelligence.

The approach is similar to what we’re seeing in the Windows world: Intel launched its 14th-generation Meteor Lake architecture featuring a chip with an NPU, and Qualcomm’s new Snapdragon X chips built for Microsoft’s Copilot Plus PCs have them, too. As a result, many AI features on Windows are gated to new devices that can perform work locally on these chips.

According to Apple’s research, out of 750 tested responses for text summarization, Apple’s on-device AI (with appropriate adapter) had more appealing results to humans than Microsoft’s Phi-3-mini model. It sounds like a great achievement, but most chatbot services today use much larger models in the cloud to achieve better results, and that’s where Apple is trying to walk a careful line on privacy. For Apple to compete with larger models, it is concocting a seamless process that sends complex requests to cloud servers while also trying to prove to users that their data remains private.

If a user request needs a more capable AI model, Apple sends the request to its Private Cloud Compute (PCC) servers. PCC runs on its own OS based on “iOS foundations,” and it has its own machine learning stack that powers Apple Intelligence. According to Apple, PCC has its own secure boot and Secure Enclave to hold encryption keys that only work with the requesting device, and Trusted Execution Monitor makes sure only signed and verified code runs.

Apple says the user’s device creates an end-to-end encrypted connection to a PCC cluster before sending the request. Apple says it cannot access data in the PCC since it’s stripped of server management tools, so there’s no remote shell. Apple also doesn’t give the PCC any persistent storage, so requests and possible personal context data pulled from Apple Intelligence’s Semantic Index apparently get deleted on the cloud afterward.

Each build of PCC will have a virtual build that the public or researchers can inspect, and only signed builds that are logged as inspected will go into production.

One of the big open questions is exactly what types of requests will go to the cloud. When processing a request, Apple Intelligence has a step called Orchestration, where it decides whether to proceed on-device or to use PCC. We don’t know what exactly constitutes a complex enough request to trigger a cloud process yet, and we probably won’t know until Apple Intelligence becomes available in the fall.

There’s one other way Apple is dealing with privacy concerns: making it someone else’s problem. Apple’s revamped Siri can send some queries to ChatGPT in the cloud, but only with permission after you ask some really tough questions. That process shifts the privacy question into the hands of OpenAI, which has its own policies, and the user, who has to agree to offload their query. In an interview with Marques Brownlee, Apple CEO Tim Cook said that ChatGPT would be called on for requests involving “world knowledge” that are “out of domain of personal context.”

Apple’s local and cloud split approach for Apple Intelligence isn’t totally novel. Google has a Gemini Nano model that can work locally on Android devices alongside its Pro and Flash models that process on the cloud. Meanwhile, Microsoft Copilot Plus PCs can process AI requests locally while the company continues to lean on its deal with OpenAI and also build its own in-house MAI-1 model. None of Apple’s rivals, however, have so thoroughly emphasized their privacy commitments in comparison.

Of course, this all looks great in staged demos and edited papers. However, the real test will be later this year when we see Apple Intelligence in action. We’ll have to see if Apple can pull off hitting that balance of quality AI experiences and privacy — and continue to grow it in the coming years.

Latest article