Meta today released Llama 3.1 405B, its largest and most capable large language model yet, which the social network claims can go toe-to-toe with OpenAI and Anthropic’s top models.
“Our experimental evaluation suggests that our flagship model is competitive with leading foundation models across a range of tasks, including GTP-4, GPT-4o, and Claude 3.5 Sonnet,” Meta boasted in an announcement, describing the neural network as the “world’s largest and most capable openly available foundation model.” As you’d expect for an LLM, Llama 3.1 405B generates prose, chat responses, and more from input prompts.
First teased alongside the launch of its smaller eight- and 70-billion parameter siblings earlier this spring, Meta’s Llama 3.1 405B was trained on more than 15 trillion tokens — think of these each as fragments of words, phrases, figures, and punctuation — using 16,000 Nvidia H100 GPUs.
In total, the Facebook giant says training the 405-billion-parameter model required the equivalent of 30.84 million GPU hours and produced the equivalent of 11,390 tons of CO2 emissions.
However, Meta insists this much computing power was necessary to train the latest Llama in a meaningful amount of time, and is its first model trained at this scale. The Instagram titan also stuck with a standard decoder-only transformer architecture, rather than implement a more complex mixture of expert models to improve stability during training.
The result is a model that, at least according to Meta’s benchmarks, is ahead of larger, more proprietary systems from OpenAI and Anthropic on a variety of benchmarks. OpenAI’s GPT-4, for reference, is reportedly on the scale of 1.8 trillion parameters in size.
Meta claims its flagship Llama 3.1 405B model can go toe-to-toe with OpenAI’s GPT-4 and Anthropic’s Claude 3.5 Sonnet in a variety of these AI benchmarks … Click to enlarge
Despite being smaller than some competing models, you’ll still need a rather beefy system to get Llama trotting along.
At 405 billion parameters, Meta’s model would require roughly 810GB of memory to run at the full 16-bit precision it was trained at. To put that in perspective, that’s more than a single Nvidia DGX H100 system (eight H100 accelerators in a box) can handle. Because of this, Meta has released a 8-bit quantized version of the model, which cuts its memory footprint roughly in half.
It’s not clear whether this quantization step was implemented before or after training; we’ve asked Meta for clarification on this. In the meantime, you can find our hands-on guide for post-training quantization here.
Llama 3 gets a point one update
In addition to the larger 405-billion-parameter model, Meta is also rolling out a slew of updates to its larger Llama 3 family.
With the 3.1 release, all three models, including the original 8B and 70B variants, have been upgraded with support for eight languages (English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai) and a substantially larger 128,000 token context window.
That’s up from 8,000 tokens for the original Llama 3 8B and 70B releases. You can think of an LLM’s context window a bit like its short-term memory. The bigger the context window, the more information the model can hold onto at any given moment when generating responses to input prompts.
8,000 tokens may sound like a reasonable context window for something like a customer-service chatbot, or for certain tasks such as long-form summarization or coding assistance, a much larger context size is definitely beneficial. This is why Google is so keen to highlight Gemini’s one million token context window.
And at least according to Meta, Llama 3.1’s larger context window has been achieved without compromising the quality of the models, which it claims have much stronger reasoning capabilities. Well, highly artificial reasoning; as always, there is no sentient intelligence here.
You can find more information about Meta’s third-generation Llama models, and the approach Meta took to training them, in our launch day coverage here.
Meta details the Llama Stack
Alongside the new and updated models, Meta also outlined its vision for where Llama will go next.
“Llama models were always intended to work as part of an overall system that can orchestrate several components, including calling external tools,” the social network giant wrote. “Our vision is to go beyond the foundation models to give developers access to a broader system that gives them the flexibility to design and create custom offerings that align with their vision.”
As part of this, Meta has released a reference system which includes sample apps and components such as the Llama Guard 3 safety model and Prompt Guard, its prompt-injection filter.
However, Meta admits that its vision is still developing and the biz is seeking feedback from industry partners, startups, and community members, to shape its AI direction. As part of this, Meta has opened a request for comment on its GitHub page for what it’s calling the Llama Stack.
Llama Stack will eventually form a series of standardized interfaces that define how toolchain components — for example, fine-tuning or synthetic data generation — or agentic applications should be built. Meta’s hope is that by crowdsourcing these efforts such interfaces will become the industry standard.
Committing to open AI development
Meta’s position on developing AI models in the open hasn’t changed much. CEO Mark Zuckerberg emphasized the importance of open AI development in a letter published Tuesday that drew comparisons to the open source Linux kernel’s victory over proprietary Unix operating systems.
“Linux gained popularity – initially because it allowed developers to modify its code however they wanted and was more affordable, and over time because it became more advanced, more secure, and had a broader ecosystem supporting more capabilities than any closed Unix,” he wrote. “I believe that AI will develop in a similar way.”
In line with this, Meta is also modifying Llama’s license structure to allow developers to use the outputs from Llama models to improve other models. For example, if you wanted to use Llama 3.1 405B to generate a mountain of synthetic data to train a smaller non-Meta model, you can now do that.
It’s worth noting, Llama’s licensing has proven to be somewhat contentious in the past. If you can’t get behind Meta’s license, there are several MIT and Apache 2.0 licensed models from Microsoft, Mistral, and others.
In any case, the trio of Llama 3.1 models are available for download on both Hugging Face and Meta’s website, and if you’d like to try them — including 405B — at home, check out our local LLM guide here. ®