Saturday, November 23, 2024

Apple claims its maxed M4 chips 4x faster than rival AI PCs

Must read

With the arrival of its M4 silicon on the Mac this week, Apple wants the world to know that the silicon powering AI PCs is no match for its chips.

In its launch announcement, Apple boasted its mid-range M4 Pro system-on-a-chip (SoC) – which can be had with up to 14 CPU cores (ten performance, four efficiency), and 20 GPU cores – was “far more powerful and capable than any AI PC chip,” boasting up to 2.1x the performance of Intel’s Core Ultra 7 258V with its 48 TOPS NPU.

Step up to the new M4 Max SoC – which can be had with up to 16 CPU cores (12 performance, 4 efficiency) and up to 40 GPU cores – and Apple claims a 4x performance advantage in AI workloads over the Intel model. That’s despite both the M4 Pro and Max being saddled with the same 38 TOPS NPU as the base M4 part we looked at in May.

What workload Apple is referring to isn’t exactly clear, but we suspect Apple is primarily focusing on bandwidth-constrained workloads – like large language models (LLMs), where its chips’ higher memory bandwidth and larger capacity give it an edge.

And Apple has not compared the M4’s performance to other AI PC chips from AMD or Qualcomm.

As we’ve previously pointed out, the LLMs that power features like writing assistants, chatbots, and summarization features are more often memory bound than compute limited.

The M4 Pro supports up to 64GB of LPDDR memory capable of 273GB/sec of bandwidth, while the more powerful M4 Max boasts twice that at 128GB and 546GB/sec.

For comparison, the Intel Core Ultra part Apple used in its testing supports roughly 32GB LPDDR5x memory but can only manage about 132GB/sec of bandwidth, which coincidentally works out to a 2.06x advantage for M4 Pro.

Having said that, the base model M4 tops out at 32GB of capacity at 120GB/sec, and as we’ll discuss in a bit, it is much closer in price to the Intel system Apple is comparing against.

Because of this, Apple boasts developers can “easily interact with large language models that have nearly 200 billion parameters.”

Take that with several grains of salt.

For starters, running a model that large would require using heavy quantization down to four bits or less – something we’ve explored in more detail here. Second, the performance of that model will be poor, at around 5.4 tokens a second generation in the best-case scenario.

That’s because at 4-bit quantization, a 200 billion parameter model will consume at minimum 100GB. That means you’ll really need an M4 Max, which has been maxed out with 128GB of RAM.

In case you’re wondering, you can get a rough sense of the generation rate by dividing memory bandwidth by the size of the model in gigabytes at a given precision (multiply parameter count by 2x for FP16, 1x for FP8/INT8 or 0.5x for Int4). In this case, 546GB/sec divided by 100GB. However, it’s worth noting that real-world performance is usually much lower.

One way around this is to use a mixture of expert models like Mixtral 8x22B, which is composed of multiple expert models, each roughly 22 billion parameters in size for a total parameter count of 141 billion. But because only two or three of those experts are active at any given time, generation rates will be closer to that of a larger model, putting less pressure on the memory subsystem.

So despite Apple’s M4 silicon coming with a beefier 38 TOPS NPU this generation, a lot of the performance for LLM-based apps will actually be determined by how quickly information can move in and out of memory.

How useful that extra memory bandwidth will actually be for AI is hard to say. Right now, Apple Intelligence remains pretty limited on macOS, with most of the interesting integrations – like a tie-in with OpenAI’s GPT models – still more than a month away. However, if you just want to play with local LLMs, we’ve actually got a guide for that. And so long as you’ve got an M1 or better, you won’t even need a new Mac.

Generational improvements

If AI isn’t your thing, the M4 Pro and M4 Max offer a number of other significant improvements. Built on TSMC’s second-generation 3nm process tech, Apple claims the chip’s CPU core boasts the industry’s best single threaded performance.

The fine print doesn’t offer much in terms of clarity, other than to say “testing was conducted by Apple in October 2024 using shipping competitive systems and select industry benchmarks.” That could mean anything.

One big change to the CPU is Apple seems to have abandoned its funky M3 Pro configuration – which boasted up to 12 cores, but in a symmetrical six performance six efficiency core layout. The M4 Pro more closely resembles the M1/2 Pro with a bank of 8–10 performance cores and four efficiency cores.

In terms of the GPU, Apple notes the underlying architecture hasn’t changed from the M3 but the cores have been juiced – presumably by increasing clock speeds. Cupertino says raytracing performance is twice as fast as last gen.

Somewhat adjacent to graphics processing, Apple notes that the M4 Max includes an enhanced media engine with two video encode engines and two ProRes accelerators.

Finally, in terms of I/O, the Pro and Max chips boast support for Thunderbolt 5 with up to 120GB/sec per port for storage, peripherals, monitors, and other devices. The base M4 on the other hand is still limited to Thunderbolt 4 speeds up to 40GB/sec.

New MacBooks

Of course these chips only come in Apple’s latest Macs. Earlier this week, we took a look at Apple’s new iMacs, and in the days since, Cupertino has rolled out refreshed MacBook Pros and Mac minis as well.

We’ll start with the new MacBook, as it’s arguably Apple’s most popular device. Just like iMacs, most of the M4 MacBook Pro upgrades are under the hood – the systems feature the same basic chassis design that first appeared alongside the M1 Pro and M1 Max in late 2021.

Apple's new M4 MacBooks look pretty much identical to last year's. All the big changes are under the hood

Apple’s new M4 MacBooks look pretty much identical to last year’s. All the big changes are under the hood – Click to enlarge

The notebooks are available in both 14-inch and 16-inch sizes, but now come with the M3-generation’s Space Black paint job as standard across the entire lineup. Classic anodized aluminum is still an option but Space Gray is out.

In terms of ports, all M4 models now come equipped with three USB-C/Thunderbolt ports along with a full sized HDMI, SDXC card reader, and of course the MagSafe charging adapter.

In terms of the display, the M4 MacBook Pros have much higher brightness for standard dynamic range content up to 1000 nits versus 600 nits last model. And, just like iMac, if you’re willing to shell out an extra $150, you can get a nano-texture – aka matte – display. Along with being brighter and offering a matte version, Apple makes good use of the display’s notch to cram in a 12MP Center Stage camera.

Apple also appears to have finally listened to pro users, with all three models able to run multiple external monitors: 2x 6K displays at 60Hz in the case of the M4 and M4 Pro, while M4 Max can simultaneously support up to four external displays.

Battery remains a major selling point for the systems, with Apple claiming 18 to 24 hours of useful battery while playing back video, or more realistically 13 to 16 hours while doomscrolling Xitter.

What Apple’s notebooks aren’t is cheap. A base model 14-inch with an M4, 16GB of RAM, and 512GB of storage will set you back $1,599.

Meanwhile, a top spec 16-inch M4 Max with the 40-core GPU, nano-texture display, 128GB of RAM, and 8TB SSD will run you $7,349.

The MacBook Pros may not have changed much physically, but the Mac mini certainly has. After 14 years, Apple has finally given its minimalist model a facelift.

RIP USB-A

RIP USB-A – Click to enlarge

The new system measures just 5×5 inches – significant shrinkage from previous product, and features a new thermal system that pulls in air from the bottom and closely resembles the scheme used on the Mac Studio. Oh, and on a somewhat awkward note, the power button has been relocated to the bottom of the machine. We guess Apple doesn’t expect folks to turn their machines off very often.

Like the M2 Mac mini, the refreshed Mac mini can be had with either base model M4 or the beefier M4 Pro. However, it’s not all good news.

In exchange for the smaller form factor, the port selection has been cut back considerably: RIP USB-A. On the front are a pair of 10GB/sec USB-C ports and a 3.5mm headphone jack. On the back are power, Ethernet (1Gbit/sec or 10Gbit/sec), HDMI and a trio of Thunderbolt ports.

The base model M4 Mac mini with a ten-core CPU and GPU, 16GB of RAM, and 256GB of storage will set you back $599. A fully kitted out M4 Pro Mac mini with the 14-core CPU, 20-core GPU, 64GB of RAM, 8TB of storage, and 10Gbit/sec networking will run you $4,699.

M4 Ultra tomorrow?

While most of Apple’s M4 SoCs are accounted for, we’re still waiting on one: a refresh to the M2 Ultra. That part was essentially two M2 Maxes stitched together using advanced packaging, and remains the highest core count and the most GPU dense part in Apple’s lineup.

Based on the string of announcements over the past week, we may not have to wait long for an M4 Ultra and could get refreshed Mac Studios and Pros as early as tomorrow.

If the M4 Max is anything to go by, we can expect the chip to support up to 32 CPU cores, 80 GPU cores, 256GB of memory good for 1TB/sec of bandwidth and twin Neural Engines.

Unfortunately there’s no guarantee an M4 Ultra will actually materialize – or that if it does it’ll follow the same formula as past parts. Only time will tell. ®

Latest article