Friday, November 22, 2024

Nvidia Q1 Earnings Preview: Blackwell And The $200B Data Center

Must read

Nvidia’s management team will focus on the H200 in the upcoming earnings call, but make no mistake, we will end this year in full-on Blackwell territory. The new architecture is at the forefront of training and inference for trillion+ parameter models. More than five years ago, I called CUDA the moat for Nvidia’s AI data center story, yet should that moat become breached, the company’s rapid product road map is the first line of defense.

Nvidia is the world’s leading GPU design company, which bears reminding since such little emphasis in Wall Street is placed on what the designs intend to solve. For those paying close attention, there are clues that the company’s fast and furious data center growth will see a second wind with Blackwell.

Nvidia is Hitting Peak Growth: The Hopper Impact

Last quarter in fiscal Q4, Nvidia reported growth of 265%. Last quarter is likely to be peak growth for the company. We pointed this out three months ago when our analysis stated: “Even if we see a beat and raise, the slowing growth in the second half will be hard to overcome due to high comps. As mentioned in the introduction, Nvidia will begin to lap some stellar quarters come the October CY2024 quarter as the growth in October of CY2023 was 205.5% YoY.”

At time of writing, the revenue estimates for Nvidia point to growth of 242%. A beat/raise this quarter is not likely to flow through to a higher growth rate in H2 compared to what we saw in Q4 and what we will see in Q1. Therefore, even if Q1 inches slightly past fiscal Q4 tomorrow evening, we have hit peak growth.

Typically, a growth investor should be cautious when a company hits its peak growth rate after a drastic rise in the stock price. Here is a chart we published three months ago updated with current estimates:

Organic Growth

However, Nvidia’s margins and earnings expansion are creating an outlier of a stock. There are rumors Blackwell GPUs will be priced starting at $30,000 to $40,000 but will have more expensive memory components with HBM3e. As long as margins remain within range, this will not be consequential considering Nvidia is posting organic growth.

This is drastically different than a stock that relies on growth at any cost, which is where rapid growth is bought rather than earned. The quality of Nvidia’s growth is much better than what tech investors are used to, and this is predominately why Nvidia stock is resilient (within reason; there will always be selloffs in the market). As supply/demand becomes more balanced, it will be Nvidia’s aggressive product road map, which in many cases is designed to compete with themselves, that will keep pricing power stable, starting with Blackwell.

For example, there are recent reports that AWS is pausing orders on Hopper GPUs in anticipation of Blackwell GPUs. The market may interpret this as weakness, but this is actually a sign of immense strength. Nvidia needs to pass the baton from the H100s and H200s to the Blackwell architecture for the stock price to extend. We are less concerned with what happens in the immediate-term, and in fact, the I/O Fund has stated a few times that Nvidia is a buy on dips, implying the stock won’t go up forever. Instead, we are encouraged to see early signs of a careful transition to the next architecture to help inform our next buy.

Nvidia’s $150B to $200B Data Center: The Blackwell Effect

There is nothing quite like rapid earnings revisions intra-quarter to determine the quality of a position. For example, consider that Nvidia sold off directly after the November report, yet has gone up a rapid 91% since. The earnings revisions are why Nvidia is so strong intra-quarter:

  • This upcoming quarter is expected to report growth of 242%. Last August, the growth for the April quarter was expected to be 91.6%. Only three months ago, the estimates for the April quarter were for growth of 197.5%. Stated in terms of revenue, this quarter’s revisions have doubled from $13.8 billion in August to $24.5 billion.
  • Next quarter, the company is expected to report growth of 98.7%. This was expected to be growth of 44.6% last November. Stated in terms of revenue, next quarter’s revenue has gone up $7 billion from $19.5 billion in November to $26.7 billion in May. In the past three months alone, the estimates went up $4 billion.

Below, we discuss why margins, cash flow and strong earnings support our decision to buy on dips. However, equally as important, there is also a decent probability that FY2026 and FY2027 revenue estimates are too low. The most bullish analyst from KeyBanc is calling for a $200 billion data center segment by 2025. HSBC believes Nvidia’s FY26 revenue could be as high as $196 billion, which implies about a $192 billion data center segment. Loop Capital foresees a $150 billion data center segment as soon as this year, while Wells Fargo has estimates for a $150 billion data center segment by 2027. The exact timing from these analysts has a range, but the conclusion is very similar.

Let’s breakdown the weight of those comments with some back-of-the-napkin math, which shows that analysts are currently estimating about $122.4B in data center revenue for FY2026 (calendar year 2025). This is about 65% lower than the more bullish analyst estimates of $200 billion in data center revenue.

  • Q1 FY25: $20.75B
  • Q2 FY25: $23B
  • Q3 FY25: $25.5B
  • Q4 FY25: $27.7B
  • Q1 FY26: $27.87B
  • Q2 FY26: $29.7B
  • Q3 FY26: $31.51B
  • Q4 FY26: $33.25B

These are the current estimates, yet if the analysts are correct, then the far right of the graph will end in $50B quarterly revenue. The difference between the current consensus and this much higher trajectory can be summarized in one word: Blackwell.

There are additional data points in the supply chain and on the demand side that support Blackwell seeing an increase in orders over Hopper. For example, Taiwan Semi’s CoWos capacity, which is essential for Blackwell’s architecture, is estimated to rise to 40,000 units/month by the end of 2024, which is more than a 150% YoY increase from ~15,000 units/month at the end of 2023. Applied Materials has boosted its forecast for HBM packaging revenue from a prior view for 4X growth to 6X growth this year. According to Wells Fargo, Taiwanese export data rose 360% year-over-year and 33% quarter-over-quarter, and is often correlated to Nvidia data center revenue.

Note: It’s important to remember this is not earnings call on what will happen tomorrow evening as the revenue will be reported when it ships to the customer. However, it helps to consider there are directionally bullish data points should the market sell off following the report and provide us a lower entry.

Notably, the premiere component for the H200 and Blackwell is HBM3e memory, which is currently supply constrained. Samsung and SK Hynix are both re-allocating ~20% of DRAM production capacity to HBM to meet high demand, while HBM4 roadmaps are being accelerated.

CEOs of major companies in AI acceleration are in agreement the total addressable market is much, much larger than today’s market size. Lisa Su of AMD has stated the AI chip market will reach $400B by 2027. Intel’s CEO has stated AI chips will become a $1T opportunity by 2030, which is almost twice the size of the entire chip industry in 2023.

Big Tech capex is supporting this growth. Our firm has been especially strong on correlating capex to AI investments for our paid research members, where we held a 1-hour webinar in April discussing our expectations that capex increases in support of AI stocks. We followed this up with free analysis in our newsletter that tracked a 35% YoY increase to $200 billion across Big Tech companies. A disproportionate amount of this will go to Nvidia.

We’re closely tracking Big Tech’s capex plans for 2024 and how this will flow downstream to AI hardware companies. The I/O Fund had a 45% allocation to AI going into 2023, one of the highest on record. Today, the AI allocation is higher with many lesser-known names. Learn more here.

China:

A curveball in the report could be higher than expected China revenue due to China-specific GPUs, such as the H20. Similar to Big Tech in the United States, China’s main players are stockpiling GPUs to secure their lead in AI.

Regarding China, last quarter, the following was stated: “Growth was strong across all regions except for China, where our Data Center revenue declined significantly following the U.S. government export control regulations imposed in October. Although we have not received licenses from the U.S. government to ship restricted products to China, we have started shipping alternatives that don’t require a license for the China market. China represented a mid-single-digit percentage of our Data Center revenue in Q4, and we expect it to stay in a similar range in the first quarter.”

Nvidia’s Blackwell will Answer to Hopper’s Excellence

The product road map is the single most important thing investors should be focused on. A good chunk of the AI accelerator story is understood at this point. What is not understood is how aggressive Nvidia is becoming by speeding up to a one-year release cycle for its next generation of GPUs instead of a two-year release cycle.

This means Nvidia is competing with itself by putting Blackwell dangerously close to Hopper’s product cycle. This move is bold, it’s daring, and it’s absolutely necessary.

Here is the very ambitious eight month schedule Nvidia has set for itself:

  • The H200 with HBM3e is shipping now.
  • The B100 and GB200 are shipping in late 2024.
  • The B200 will be released in early 2025.

The Blackwell architecture remains on 4nm dies, similar to the Hopper architecture. What is different is that Blackwell has 2 reticle-sized GPU dies. Reticle size refers to the limit in the chip surface that can be exposed by a single mask. The limit is set by the lithography equipment. At one point it was expected Blackwell would be on 3nm dies, yet due to reasons unknown, Nvidia is moving forward with 4nm. Since Nvidia is not able to offer a more advanced process node, the company is instead doubling the silicon. The Blackwell architecture is rumored to be priced between $30,000 to $40,000, which is higher than the H100’s reported $25,000 cost. This is competitive considering B200 will offer nearly 30X better performance (benchmarks are provided by Nvidia).

B100 & B200

The B100 is a replacement chip, which means customers can remove the H100 and place the B100 in the same rack. The B100 is air-cooled and doubles NVLink speeds from the H100 and H200. The B100 is will ship in Q3 and provide upgrades to memory from 80GB in the H100, 141GB in the H200 to 192GB in the B100.

The B200 GPU chipset due in Q1 of next year will deliver a 2.5X training improvement and 5X inference improvement over the H100. This is due to the B200 having 208 billion transistors compared to the H100’s 80 billion transistors.

The B200 will also have 20 petaflops of FP4 compared to the H100’s 4 petaflops of FP8 reaching 32 petaflops of FP8 in the DGX H100 systems. The difference is that the smaller bit size allows for an economical way to achieve more speed when giving up a small amount of accuracy doesn’t make a critical difference. This also helps in the face of a slowing Moore’s Law. Following the release of the Hopper H100, Intel released Gaudi2 which supports FP8. About two years back, chip makers Graphcore, AMD and Qualcomm pushed for an industry-standard for floating point format FP8. However, the recent B200 will have a second-generation transformer engine that supports 4-bit floating point (FP4) with the goal of doubling the performance and size of models the memory can support while maintaining accuracy.

Part of the secret sauce of the H100 is the transformer engine. The A100 lacked support for FP8 compute at default whereas the H100 leveraged a transformer engine to switch between FP8 and FP16, depending on the workload. The second-generation transformer engine in the Blackwell architecture will offer FP4. This is helpful because AI models are moving toward neural nets that lean on the lowest precision and yet still yields an accurate result. In this case, 4 bits double the throughput of 8-bit units, compute faster and more efficiently, and they require less memory and memory bandwidth.

The main feature from the Transformer Engine is the ability to choose what precision is needed for each layer in the neural network at each step, transitioning between 4-bits, 8-bits, 16-bits, or 32-bits. The H100 is able to do matrix math with two forms of 8-bit numbers with either 5-bits as the exponent or 4-bits as the exponent: E5M2 and E4M3. This is important because the E4M3 may be favored for back propagation while E5M2 may be favored for inferencing.

Building on the first-gen transformer engine, the B200’s second-gen transformer engine will support double the compute and model sizes with new 4-bit floating point AI inference capabilities.

GB200 NVL72 Systems:

According to the current product road map, the GB200 will be released before the B200 GPUs. The real fireworks will begin with the GB200 NVL36/NVL72 systems in late 2024 and then continue with the B200 GPUs in early 2025.

The GB200 Grace Blackwell chip connects two Blackwell Tensor core GPUs with the Nvidia Grace CPU. The GB200 NVL 72 rack-scale exascale supercomputer, connects 36 Grace CPUs with 72 Blackwell GPUs in a rack-scale design with liquid cooling. We’ve written in-depth about liquid cooling for our premium research members, learn more here.

According to HSBC, the average sales price of NVL36/NVL72 server rack will be $1.8 million and $3 million, respectively. Notably, its expected the GB200 systems will have strong margins due to using an in-house CPU.

Here are the stats provided from Nvidia on how it will compare:

  • 30X faster real-time trillion-parameter LLM inference
  • 4X LLM training
  • 25X energy efficiency
  • 18X data processing

Source: Nvidia, the GB200 System due to ship in Q4 this year

The GB200 will provide 4X faster training performance than the H100 HGX systems and will include a second-generation transformer engine with FP4/FP6 Tensor core. As stated above, the 4nm process integrates two GPU dies connected with 10 TB/s NVLink with 208 billion transistors.

NVLink Switch is a major component to the Blackwell upgrade. Fifth-generation NVLink enables multi-GPU communication at high speed, reaching 1.8 TB/s bidirectional throughput or 14X the bandwidth of PCIe for a single GPU.

For the NVL72 systems, NVLink Switch can reach 130 TB/second, which is “more than the aggregate bandwidth of the internet.” Therefore, it’s the compute and the communication capabilities of the upcoming GB200 release that are important to consider. The 72 GPUs in the NVL72 can be used as a single accelerator for 1.4 exaflops of AI compute power.

Why GB200s and B200s will Drive more Demand:

To scale up a model, AI departments utilize a Mixture of Experts (MoE) approach. MoE distributes a computational load across “multiple experts” (or neural networks) and trains across thousands of GPUs using what is called model and pipeline parallelism. This enables more compute-efficient pretraining yet the parameters still need to be loaded in RAM, so the memory requirements remain high.

For inference, GB200 will deliver “a 30X speedup” for 1 trillion­­+ parameter models by leveraging FP4 precision and fifth-generation NVLink. This is what that the leap in real-time throughput for inference looks like for a 1.8 trillion parameter model:

Source: Nvidia Blog

Blackwell is for the trillion+ parameter era of generative AI. The architecture is designed to support the largest language models today and is future-proofed with the GB200 NVL72 rack-scale solution, which is an exascale computer that contains up to 5,000 NVLink cables that total 2 miles. You also have to consider that AMD was coming to market in the first release with nearly 2X memory as the H100. Nvidia is remaining competitive with HBM3e and soon HBM4 to help models run in memory.

The GB200 also has a new decompression engine that allows GPUs to process and decompress compressed data sets to speed up database queries. Coupled with 8 TB/s of high memory bandwidth and high speed NVLink, the GB200 systems deliver up to 18X faster database queries. In addition to this, there is up to 13X faster physics-based simulations compared to CPUs and 22X faster simulations for computational fluid dynamics (CFD).

More on Memory:

High bandwidth memory (HBM) offers higher bandwidth, capacity, performance, and lower power by vertically stacking up to twelve DRAM memory chips to shorten how far data has to travel, while also allowing for smaller form factors. Stacked memory chips are connected through something called “through silicon vias” or TSVs. HBM is increasingly being used to power machine learning, high performance data centers, and more recently, generative AI models.

CoWoS (chip-on-wafer-on-substrate) architecture refers to 3D stacking of memory and processor modules layer by layer to create chiplets. The architecture leverages through-silicon vias (TSVs) and micro-bumps for shorter interconnect length and reduced power consumption compared to 2D packaging.

The advanced CoWoS packaging that is needed to combine logic system-on-chip (SoC) with high bandwidth will take longer, and thus, it’s expected that Blackwell will be able to fully ship by Q4 this year or Q1 next year. How management guides for this will be up to them, but commentary should be fairly informative by Q3 time frame.

GPUs will move from 8Hi configurations to 12Hi HBM3e configurations by 2025. These upgrades are needed to train and deploy large models with trillions of parameters in the near future. What Nvidia’s product road map intends to accomplish is a way forward for real-time inference that is computationally efficient, cost-effective and energy efficient.

My firm has covered HBM3e in the past when we stated in a premium research report six months ago:

The recent surge in generative AI and AI GPUs, spurred by the success of OpenAI’s ChatGPT and development of hundreds of other large language models, are forecast to bring about a new DRAM market, underpinned by high-bandwidth memory (HBM) and DDR5

[…] HBM3 and HBM3e are becoming the next battleground for memory chip manufacturers as well as AI chip design companies, especially Nvidia and AMD, who are pushing the boundaries with the amount of memory bandwidth in each GPU.

AMD’s competing GPUs, the MI300 series, substantially boosted memory and bandwidth relative to the H100, utilizing Samsung’s HBM3. The MI300A is shipping with 128GB HBM3 memory while the MI300x ships with 192GB memory and 5.2 TB/s of bandwidth – that’s 1.6x more bandwidth and 2.4x more HBM3 density than Nvidia’s H100.

Nvidia is rapidly moving forward with its GPU roadmap, as it aims to launch its next-gen H200 and B100 GPUs next year followed by the X100 GPU in 2025 – each GPU will accelerate AI inference times along an exponential curve, thus creating a need for more memory and more bandwidth.”

Nvidia’s Fiscal Q1 Report Card: What You Need to Know

Now that we’ve touched base on the importance of Blackwell, let’s get prepped for this evening. Here is what analysts are expecting:

Revenue:

  • For Q1, Nvidia is expected to report revenue of $24.6 billion, for growth of 242%. Management guided for revenue of $24 billion +/- 2%, for a growth rate of 233.7%, at the midpoint.
  • Next quarter, the company is expected to report revenue of $26.8 billion for growth of 98.7%.
  • On a fiscal year basis, the company is expected to report revenue of $113.2 billion for growth of 85.8%. These estimates have doubled since August.
  • The FY2026 growth rate of 26.1% for revenue of $142.8 billion, and then FY2027 growth rate of 17.7% for revenue of $168 billion, is where estimates are too low if there is a $200 billion data center segment in the medium-term.

EPS:

In Nvidia’s case, top line growth is flowing through to bottom line growth disproportionately.

  • For Q1, Nvidia is expected to report adjusted EPS of $5.58 for growth of 411.9%.
  • Next quarter, Nvidia is expected to report adjusted EPS of $6.00 for growth of 122.1%.
  • For FY2025, adjusted EPS is expected to be $25.4 for growth of 96%. FY2026 adjusted EPS is expected to be $32.2 for growth of 26.6%.

Margins:

As the story for Nvidia unfolds over the next few years, keep an eye on margins as software will begin to positively impact the company with higher margins. The company is expected to end the year with $2 billion in software revenue.

In the near-term, and especially for this earnings report, it’s likely that analysts ask about the costs associated with HBM3e as memory components are increasing in costs. TrendForce has reported that HBM3 prices have risen 5-fold since 2023. HBM3e prices will be even higher than HBM3. Analysts may also ask about the yield issues that major memory suppliers SK Hynix, Micron, and Samsung are reported to be facing, given the complexities in the manufacturing process for HBM3e and its longer production cycle. For our premium members, we’ve discussed what stocks will benefit from this leading trend in 2024.

  • Management guided for gross margin of 76.3% for gross profit of $18.3 billion. If reported in line, this will represent flat growth QoQ and 1170 bps expansion from 64.6% in the year ago quarter.
  • Management guide for adjusted gross margin is 77%. If reported, it will represent 30 bps QoQ expansion and 1020 bps expansion YoY.
  • Operating margin was guided to be 61.7% for operating profit of $14.8 billion. If reported, this will be flat QoQ yet up a whopping 32-points from 29.76%. This is the most rapid operating margin expansion that I have personally witnessed. It is rare, even with a hyper growth company to report a 32-point expansion on this line item.
  • Adjusted operating margin of 66.6% will be flat QoQ and up from 42.4% in the year ago quarter.
  • Net margin guide is 52.1%. If reported, it will be down (3.5%) sequentially. However, a remarkable 23.7% expansion on a YoY basis.

Cash and Debt:

Last quarter, Nvidia reported operating cash flow of $11.5 billion for a margin of 52%. The free cash flow of $11.2 billion represents a margin of 50.7%. The fiscal year free cash flow of $26.9 billion was more than 7 times higher than the fiscal year 2023 free cash flow of $3.75 billion.

Key Segments:

The data center segment reported revenue of $18.4 billion for growth of 409% YoY and was up 29% QoQ. Nvidia’s tough comps kick in with the Q2 July quarter when the company reported DC revenue of $10.3 billion for growth of 171%, and thus the guide is key. Management will not guide to DC specifically but it’ll be easy enough for analysts to read through the lines that any beat/raise on Q2 is likely coming from the DC segment.

The CFO mentioned in the earnings call that 40% of the revenue came from inference in the past year. “Fourth quarter data center growth was driven by both training and inference of generative AI and large language models across a broad set of industries, use cases and regions. The versatility and leading performance of our data center platform enables a high return on investment for many use cases, including AI training and inference, data processing and a broad range of CUDA accelerated workloads. We estimate in the past year approximately 40% of data center revenue was for AI inference.”

Gaming revenue of $2.8 billion was up 56% YoY and was flat QoQ. Nvidia has fared better than gaming peers due to the timing of the RTX 4000 Series, which I covered in a previous editorial: “Nvidia Stock: Evidence Gaming has Bottomed and Why It’s Important.” With that said, management guided for a seasonal decline in gaming.

  • Professional Visualization reported revenue of $463 million for growth of 105% YoY and 11% QoQ.
  • Automotive reported revenue of $281 million, down 4% YoY but up 8% QoQ.
  • OEM & Other reported revenue of $90 million, up 7% YoY and 23% QoQ.

Conclusion:

As stated on Making Money with Charles Payne today, the upcoming earnings report is only one piece to the story, whereas the ultimate fireworks will be when the Blackwell architecture begins to ship Q3-Q4. The product road map is communicating that AI accelerators are secular; not cyclical.

We will see peak growth this quarter – even if we get that beat that Nvidia is becoming known for, H2 will certainly see a slowdown. This is normally a great jumping off point for investors but those who stick with Nvidia will be rewarded for a few reasons:

  • This is an organic growth company, which is very rare in tech where most growth is bought. That means Nvidia is likely to remain strong on margins and EPS, even in the face of slowing revenue growth.
  • The supply chain is providing hints that analyst estimates for the data center are too low – there could be up to 65% upside on those estimates in the next 6-7 quarters.
  • The reason I side with Keybanc, Loop and others in thinking the estimates are too low – and this last point is critical – is because Nvidia is speeding up its product road map and introducing the Blackwell architecture to address the trillion+ parameter models that Big Tech will compete to create and train.

Nvidia has sold off 10% or greater about 9 times since the 2022 low. We see any dips as buying opportunities as we brace for Blackwell toward the end of this year.

The I/O Fund we had five positions with returns over 100% and seven positions beat the Nasdaq in 2023. This contributed to cumulative returns of 131% since May of 2020. For more in-depth research from Beth, including 15-page+ deep dives on the stock positions that the I/O Fund owns, take advantage of our biggest sale of the year in honor of our four-year anniversary and subscribe here.

If you would like notifications when my new articles are published, please hit the button below to “Follow” me.

Latest article