Whether you see AI as a great tool with huge benefits or a social disease that only benefits big tools, powerful new chips are making it possible to train AI faster than ever before. Become. Cerebras Systems announced the world's fastest AI chip, the Wafer Scale Engine 3 (WSE-3). It powers the Cerebras CS-3 AI supercomputer at a peak performance of 125 PetaFLOPS. And it's scalable to an insane degree.
Before an AI system can produce a ton of cutely creepy little videos of cats waking up their owners, it needs to be trained on a frankly staggering amount of data, and in the process it's trained on more than 100 households. Consumes energy. But new chips and computers equipped with them can help speed up that process and make it more efficient.
Each WSE-3 chip, about the size of a pizza box, is packed with incredible data. 4 trillion It offers twice the performance of the company's previous model (which was also a previous world record holder) at the same cost and power consumption. When bundled into a CS-3 system, his single unit, which is about the size of a small refrigerator, appears to be able to provide the performance of a roomful of servers.
According to Cerebras, the CS-3 runs 900,000 AI cores and 44 GB of on-chip SRAM, delivering peak AI performance of up to 125 PetaFLOPS. In theory, this should be enough to be in the top 10 supercomputers in the world, but of course it hasn't been tested in any benchmarks, so we don't know how well it will perform in practice.
External memory options for storing all this data include 1.5 TB, 12 TB, or a massive 1,200 TB of 1.2 petabytes (PB). CS-3 can train AI models with up to 24 trillion parameters. In comparison, most AI models currently have billions of parameters, and GPT-4 is estimated to have approximately over 1.8 trillion. Cerebras says his current GPU-based computers should be able to train him a trillion parameter model just as easily as he could train a billion parameter model. Masu.
Thanks to the WSE-3 chip's wafer manufacturing process, CS-3 is designed to be scalable, allowing up to 2,048 units to be clustered into one barely intelligible supercomputer. It is capable of up to 256 exaFLOPS, and today the world's top supercomputers are still running at just over 1 exaFLOPS. With such capabilities, the company claims it can train the Llama 70B model from scratch in just one day.
It feels like AI models are already advancing at an alarming rate, and this kind of technology only adds to the firehose. No matter what job you do, AI systems will be incorporated into your work and hobbies faster than ever before.
Source: Cerebras [1],[2]