When it comes to training large language models, it's often said that the problem is a lack of GPUs. Of course, Nvidia's dominant chips were won by various AI companies competing against each other.
But everyone's favorite billionaire and technology prophet thinks there's another problem. That means we may not have enough power. Musk said the next Grok 3 generation of xAI's AI models, which he launched, will require about 100,000 Nvidia H100 GPUs to train the models.
Admittedly, getting 100,000 H100s is not easy. Or cheap. But here's the problem. Each H100 consumes 700W of peak power. So for 100,000 pieces, that's up to 70 megawatts. OK, you probably won't have all 100,000 running at 100% load at the same time. However, there is more to an AI setup than just the GPU. All kinds of supporting hardware and infrastructure are involved.
So 100,000 H100s would be over 100 megawatts, or about the same as a small city. Or for another data point, in 2022 there were 500 megawatts worth of data centers across Paris.
Yeah, 100 megawatts for just one LLM is a bit of a problem.This was said in an interview with the CEO of Norwegian Wealth Fund. Nicolai Tangen talks about X Spaces (From Reuters) Musk emphasized that while GPU availability has been and will continue to be a major constraint on AI model development, access to sufficient power will become an increasingly limiting factor.
Oh, and Musk also predicted that AGI (artificial general intelligence) will surpass human intelligence within two years. “If you define AGI (artificial general intelligence) as being smarter than the smartest person, I think it will probably be within the next year, two years,” Musk said.
However, in 2017, he also predicted that self-driving cars reliable enough to “sleep inside” would be two years away. Still waiting for it. On March 19, 2020, he predicted that the United States would have “nearly zero new cases” of the coronavirus by the end of April. Oops!
In any case, Mr. Musk's somewhat speckled techno prophecies aren't necessarily news. But he probably has a pretty solid idea of ​​how many GPUs are needed to train the next generation LLM. Therefore, a city-scale electricity budget is likely to become a reality, which is a bit concerning.
Additionally, xAI's current model, Grok 2, apparently only required 20,000 H100s. This means that the GPU increase from one AI model to the next is 5x. This is the kind of scaling that doesn't seem very sustainable, regardless of GPU count or power consumption.