Written by Max A. Charney
SAN FRANCISCO (Reuters) – Artificial intelligence benchmarking group MLCommons on Wednesday released a series of new tests and results assessing how quickly top-of-the-line hardware can run AI applications and respond to users.
Two new benchmarks added by MLCommons measure the speed at which AI chips and systems can generate responses from powerful AI models packed with data. This result gives a rough idea of ​​how quickly AI applications such as ChatGPT can provide responses to user queries.
One of the new benchmarks added the ability to measure the speed of question-and-answer scenarios for large language models. Called Llama 2, it contains his 70 billion parameters and was developed by Meta Platforms.
The MLCommons folks have also added a second text-to-image generator to the benchmark tool suite called MLPerf, based on Stability AI's Stable Diffusion XL model.
Servers powered by Nvidia's H100 chips built by Alphabet's Google, Supermicro, and others, and Nvidia itself easily won both new benchmarks with out-of-the-box performance. Several server builders have submitted designs based on the company's less powerful L40S chip.
Server builder Krai has submitted a design for an image generation benchmark using Qualcomm AI chips, which consume significantly less power than Nvidia's cutting-edge processors.
Intel also submitted a design based on its Gaudi2 accelerator chip. The company evaluated its performance as “solid.”
Raw performance is not the only metric that matters when deploying AI applications. Advanced AI chips consume enormous amounts of energy, so one of the most important challenges for AI companies is to deploy chips that deliver optimal performance while using the least amount of energy.
MLCommons has a separate benchmark category for measuring power consumption.
(Reporting by Max A. Charney in San Francisco; Editing by Jamie Freed)