On Tuesday, Intel announced a new AI accelerator chip called Gaudi 3 at its Vision 2024 event in Phoenix. The company is positioning his Gaudi 3 as a replacement for his Nvidia's H100 because of its strong performance while running large language models (such as those powering ChatGPT). Nvidia's H100 is a popular data center GPU that's been in short supply, but it looks like that situation may be easing a bit.
Compared to Nvidia's H100 chip, Intel predicts training times on Gaudi 3 will be 50% faster for both OpenAI's GPT-3 175B LLM and the 7 billion parameter version of Meta's Llama 2. From an inference perspective (running a trained model and getting the output), Intel says its new AI chips will be faster than the H100 in Llama 2 and Falcon 180B, both relatively popular open-weight models. It also claims to deliver 50 percent faster performance.
Intel is targeting the H100 due to its high market share, but the chip is not the most powerful AI accelerator chip being developed by Nvidia. Since then, the announcements of the H200 and Blackwell B200 have surpassed his H100 on paper, but neither chip has yet been released (the H200 is expected in his second quarter of 2024, Basically, it will be released now).
Meanwhile, the aforementioned H100 supply issues are causing major headaches for technology companies and AI researchers who have to fight for access to chips that can train AI models. This has led several technology companies, including Microsoft, Meta, and OpenAI (rumored), to explore their own AI accelerator chip designs, although their custom silicon is typically manufactured by Intel or TSMC. . Google has its own line of tensor processing units (TPUs) that it has used internally since 2015.
Given these issues, if Intel can set an ideal price (H100 is reportedly priced around $30,000-$40,000, although Intel hasn't quoted it) and maintain decent production. Intel's Gaudi 3 could be an attractive alternative to the H100. AMD also makes competitive AI chips, such as the AMD Instinct MI300 series, which sell for around $10,000 to $15,000.
Gaudi 3 performance
Intel says the new chip is built on the architecture of the previous generation Gaudi 2, which featured two identical silicon dies connected by a high-bandwidth connection. Each die contains 48 megabytes of central cache memory, surrounded by four matrix multiplication engines and 32 programmable tensor processor cores, for a total of 64 cores.
The semiconductor manufacturing giant claims Gaudi 3 delivers twice the AI ​​calculation performance of Gaudi 2 using 8-bit floating point infrastructure, which is essential for training transformer models. . This chip speeds up calculations using the BFloat 16 number format by 4x. Gaudi 3 also features 128 GB of cheaper HBMe2 memory capacity (which may contribute to price competitiveness) and 3.7 TB of memory bandwidth.
Data centers are well-known to be power-hungry, so Intel emphasizes the power efficiency of Gaudi 3, with higher performance across Llama 7B and 70B parameters, and Falcon 180B parameter models compared to Nvidia's H100. It claims 40% higher inference power efficiency. Eitan Medina, chief operating officer of Intel Habana Labs, attributes this advantage to his Gaudi's large matrix arithmetic engine, which requires significantly less memory bandwidth than other architectures. I am claiming.
Gaudi vs. Blackwell
Last month, we covered the splashy launch of Nvidia's Blackwell architecture, including the B200 GPU, which Nvidia claims will be the world's most powerful AI chip. So it seems natural to compare what we know about Nvidia's top-performing AI chips with the best AI chips Intel can currently produce.
First, according to IEEE Spectrum, Gaudi 3 is manufactured using TSMC's N5 process technology, closing the gap between Intel and Nvidia in terms of semiconductor manufacturing technology. The next Nvidia Blackwell chips will reportedly use a custom N4P process and offer slight performance and efficiency gains over N5.
Gaudi 3's use of HBM2e memory (as mentioned above) is noteworthy compared to the more expensive HBM3 or HBM3e used in competing chips, offering a balance between performance and cost efficiency. This choice seems to emphasize Intel's strategy to compete on price as well as performance.
As for raw performance comparisons between Gaudi 3 and B200, we won't know until the chips are released and benchmarked by third parties.
As competition heats up to fuel the tech industry's thirst for AI computing, IEEE Spectrum notes that Intel's next-generation Gaudi chip, codenamed Falcon Shores, remains the center of attention. It also remains to be seen whether Intel will continue to rely on TSMC's technology to gain a competitive edge in the AI ​​accelerator market, or leverage its own foundry business and upcoming nanosheet transistor technology.