Skip to content
Menu
Prospecting The Future
  • Introduction
  • Blogs
Prospecting The Future
March 4, 2023January 7, 2025

Teaching ChatGPT is Expensive but profitable for Nvidia – Part 1

“This is the iPhone moment of artificial intelligence” said Jensen Huang, the charismatic Nvidia CEO, to an audience hanging on to every word in rapt attention at the Haas School of Business, Berkeley on 1 Feb. The analyst community was counting down to 22 Feb when Nvidia would release its Q4 and FY23 results.

The stage was set on 30 November when ChatGPT was thrown open to the public. In just two months, it had 100 million users. For comparison, mighty Google took 1 year and 2 months to get to 100 million users.

Seeing the wild success of ChatGPT, Microsoft boldly launched on 7 Feb 2023 ‘the new Bing’. It would incorporate in its Search function Large Language Models (LLMs) from OpenAI that would be even better than ChatGPT.

A panicked Google launched its own conversational AI called Bard the very next day on 8 Feb. An unfortunate factual error in the very first demo led to a $100 billion loss of market capitalization in a single day.

On 11 Feb, Satya Nadella threw down the gauntlet to Google in an interview with The Verge:

“At the end of the day, they’re the 800-pound gorilla in this. That is what they are. And I hope that, with our innovation, they will definitely want to come out and show that they can dance. And I want people to know that we made them dance, and I think that’ll be a great day.”

It was around this time that the wider public discovered that Nvidia provided the ‘pick axes and shovels’ to the AI gold rush unfolding rapidly. Jensen’s ‘iPhone moment comment’ got breathlessly reported and the financial world was waiting eagerly for 22 Feb for Nvidia’s FY23 results.

Anti-climactic Results and Euphoric Share Price

When the Q4 and FY23 results were finally announced, they were disappointing. ‘Data Center’ quarterly revenue segment grew by 11% YOY. A fair result in the current economic environment but it was still down 6% from the previous quarter. From FY22 to FY23, Data Center grew at an impressive 41%, despite the Q4 momentum slowing down.

The ‘Data Center’ segment contains most of the AI Revenues. If Nvidia was the direct beneficiary of the AI boom, it should have shown up in the numbers.

For context, the ‘Data Center’ business had grown from a mere $0.75 billion to $10.6 billion in a mere 6 years! This was a CAGR of 56% till the headwinds of FY23 grinded down the momentum to 11% YOY.

The share price was having none of this skepticism. It shot up 14% that very day. It had touched a bottom of around $112 in mid-Oct and had since more than doubled.

AI Needs GPUs – It’s that Simple

The creation of AI Models happens in three stages:

  1. Creating an AI Model
  2. Training the AI model
  3. Inference (Deploy)

Deep Learning models go a step further and continually improve themselves on an iterative basis.

All 3 stages above require extensive amounts of computational power that can currently be met only with solutions that are mainly GPU-powered (with CPUs/DPUs acting in concert).

Stable Diffusion Conjures up Images

Before ChatGPT started blowing our minds, Stable Diffusion was capturing the imagination of artists and graphic people world wide as it conjured up images based on a few text prompts.

Teaching Stable Diffusion to draw is expensive

Stability AI, the company that offers Stable Diffusion, runs around 4,000 Nvidia A100 GPUs in AWS. Each A100 costs approximately $10,000 bringing the cost to $40 million. This is a rough estimate as the prices of GPUs could change with time, especially as faster chips make it to the market.

What is clear is that AWS would have bought A100 chips around $30-40 million from Nvidia. Stability AI itself would have paid AWS around $600,000 to rent for ‘150k hours’ according to an interesting tweet exchange with the CEO Emad Mostaque.

We actually used 256 A100s for this per the model card, 150k hours in total so at market price $600k

— Emad (@EMostaque) August 28, 2022

On 17 Nov, it was reported that Stability AI had raised $101 million. CEO Emad Mostaque mentioned that the funding will be deployed for more ‘supercomputing power’ among other users but did not specify how much will go to GPUs.

According to AIM, ‘Stable Diffusion took around 200,000 GPU hours to train on the NVIDIA A100 GPU. ‘ If we assume a GPU operates 24 hrs a day and the training was completed in a month, it could need around 277 GPUs in a month. This estimate is not too far from the 256 A100s Emad mentioned in his tweet.

The cost of ‘$600k’ or 150k-200k GPU hours was just to train the Stable Diffusion model. That doesn’t factor in the effort that went to create the model or even to run it every day.

ChatGPT’s Hunger for Compute Power

If you thought the GPU requirements for training Stable Diffusion were intensive (4,000 A100 GPUs for 150K hours), ChatGPT’s needs were at the next level.

To train ChatGPT, 10,000 A100 GPUs were used as reported in Fierce Electronics. The big guns were needed for the following reasons:

“For context, the largest version of GPT-3 consisting of 175 billion parameters took 3640 pf-days to train. This means that the GPUs needed to conduct a petaflop of operations per day for almost 10 years! “

AIM

ChatGPT is hosted on Microsoft Azure Supercomputing cluster, so its cost for training will be the ‘rent’ it pays Microsoft to use the GPUs. These are priced at $1.5 per hour per GPU.

The total cost of training such a model is widely reported to be $5 million over 34 days!

With each A100 GPU priced at approximately $10,000, Nvidia would have made approximately $100 million from its sale to Microsoft.

Conclusion

Training AI Models is GPU-intensive and this is what the market has recently discovered. We can estimate that over the past year, Nvidia would have sold A100 chips for $40 million and $100 million to AWS and Azure respectively. This GPU-based Cloud Solution has trained Stable Diffusion and ChatGPT.

Does Nvidia have anything left in the tank after these massive sales? Or will the boom and bust cycle that is all too familiar in Gaming GPUs be seen in AI sales?

For starters, companies are looking to improve their AI models. Newer versions of Stable Diffusion and ChatGPT are on their way. While the current models were trained on A100, the ‘workhorse of AI’, Nvidia never sits still. It has unleashed the H100, which is 4-9x faster than A100.

Satya Nadella has fired the first shot in what is turning out to be an arms race for AI dominance. And for now, Nvidia A100 and H100 is the only game in town – dominating the AI and Inference GPU market with 80-95% of sales. AMD is a distant second and no one else in the horizon.

APPENDIX

Interesting resources:

  • The Social Media Platforms That Hit 100 Million Users Fastest
  • Stability AI, the startup behind Stable Diffusion, raises $101M

1 thought on “Teaching ChatGPT is Expensive but profitable for Nvidia – Part 1”

  1. Pingback: Running Stable Diffusion is also expensive but Nvidia isn’t complaining – Part 2 – Prospecting The Future

Comments are closed.

Recent Posts

  • Qualcomm – The ‘Catfish’ stirring up the PC Market
  • AI Scaling Laws and the Race to AGI – Part 2
  • AI Scaling Laws and the Race to AGI – Part 1
  • Qualcomm – When AI says Hi to the Edge
  • Snowflake – A Curiously Resilient Performance in Q3 FY24

Archives

  • February 2025
  • October 2024
  • September 2024
  • August 2024
  • January 2024
  • March 2023
  • February 2023
  • January 2023

Categories

  • Intel
  • Nvidia
  • Qualcomm
  • Snowflake
  • Tesla
  • Uncategorized
©2025 Prospecting The Future | Powered by WordPress and Superb Themes!