AI infrastructure takes center stage at GTC 2024

Talk at Nvidia GTC conference.

Getty Images

The recent GTC conference centered around Nvidia's growing influence in enterprise infrastructure. GTC is the industry's largest AI-focused event, bringing together nearly the entire AI ecosystem.

While applications and underlying models can provide enterprise value and drive investment, the specialized infrastructure required to support AI is what makes modern AI practical. Nvidia is at the center of it all, enabling both cloud and on-premises solution providers.

Nvidia is a platform company

The big news from Nvidia is the launch of the next generation Blackwell accelerator. This brings a new level of functionality to AI training and high-performance inference for generative AI. Nvidia’s new BH200…

While customers will likely have access to raw GPUs, Nvidia will package the accelerators as system-level solutions, providing efficient turnkey solutions optimized for enterprise AI. This starts with his Nvidia GB200 NVL72, an advanced rack-scale AI supercomputer designed for large-scale AI and HPC challenges.

Powered by the Grace Blackwell Superchip, it integrates high-performance NVIDIA GPUs and CPUs with a 900 GB/s NVLink-C2C interface for seamless data access. This architecture delivers 80 petaflops of AI performance, 1.7 TB of fast memory, and support for up to 72 GPUs.

Nvidia has further scaled up with the introduction of the DGX SuperPOD with the DGX GB200 system. This SuperPOD is scalable to tens of thousands of GPUs and leverages the Nvidia GB200 Grace Blackwell superchip to tackle trillion-parameter models.

This next-generation system guarantees constant uptime with full-stack resiliency. Features an efficient water-cooled design for ultimate performance. Integrate Nvidia AI Enterprise and Base Command software to streamline AI development and deployment and maximize developer productivity and system reliability.

AI continues to be cloud-first

Nvidia is focused on getting out of the GPU business and bringing system-level solutions to market. This has recently caused tensions among cloud service providers who prefer to build their own solutions, but that tension seems to be fading.

Nvidia and Amazon's AWS, the last CSP to announce support for the current generation DGX cloud, have announced strategic initiatives that go beyond just DGX support and include joint development of a new AI supercomputer as part of the revamped Project Ceiba. We made a joint presentation.

Oracle Cloud, one of Nvidia's first DGX partners, also announced extensive support for the GPU giant's new systems. Going one step further, Oracle is offering Nvidia's Bluefield-3 DPU as part of its networking stack, giving customers a powerful new option for offloading data center tasks from the CPU.

Microsoft Azure announced support for Nvidia's new Grace Blackwell GB200 and advanced Nvidia Quantum-X800 InfiniBand networking. Similarly, Google Cloud supports his Nvidia's GB200 NVL72 system, which combines 72 Blackwell GPUs and 36 Grace CPUs interconnected with 5th generation NVLink.

OEMs are ready for AI

Despite popular belief, AI doesn't just exist in the cloud. Dell Technologies, HPE, Supermicro, and Lenovo all have strong AI-related businesses. Dell and HPE each reported a healthy AI-related server backlog of approximately $2 billion in their latest financial results.

Nvidia backed the on-premises story with a joint announcement with Dell that the two companies would collaborate on a new AI Factory initiative. Dell's AI Factory combines Dell's robust portfolio of compute, storage, networking and workstations. This integration includes Nvidia's Enterprise AI software suite and Nvidia Spectrum-X networking fabric, ensuring a seamless and robust AI infrastructure.

Dell also announced updates to its PowerEdge server lineup to support Nvidia's next-generation accelerators, including the introduction of powerful new liquid-cooled 8-processor servers.

Lenovo announced new ThinkEdge servers designed for AI. The new liquid-cooled 8-processor ThinkSystem SR780a V3 server boasts efficient power usage. At the same time, Lenovo ThinkSystem SR680a V3 is an air-cooled server that supports AI acceleration with Intel processors and various Nvidia GPUs. Finally, Lenovo PG8A0N supports his new Nvidia GB200 Grace Blackwell superchip in his 1U node with open-loop liquid cooling for accelerators.

Hewlett Packard Enterprise did not introduce new servers, but announced new features for its targeted generative AI solution. HPE and Nvidia are collaborating on new HPE Machine Learning Inference software, enabling enterprises to quickly and securely deploy ML models at scale. The latest product integrates with his Nvidia NIM and uses pre-built containers to provide him with an Nvidia-optimized foundation model.

Storage adapts to AI

Storage for AI training is fundamentally different from traditional enterprise storage. AI places new demands on throughput, latency, and scalability. Traditional storage architectures can accommodate moderate AI infrastructure, but large training clusters may require highly scalable parallel file systems. Both approaches were on full display at his GTC.

Weka and VAST Data are in a fierce battle to provide data infrastructure to AI service providers, and it will be difficult for GTC to avoid their respective conflicts. Weka announced a new system whose software will earn his Nvidia DGX SuperPOD certification. At the same time, VAST Data showcased its recently released Bluefield-3 solution to provide scalable storage for large scale AI clusters.

Hammerspace has also been in the news for Meta's recently announced use of Hammerspace technology in its 48K GPU cluster.

On-premises still requires a traditional approach to storage. Pure Storage announces new supporting AI workloads, including RAG Pipeline, Nvidia OVX Server Storage Reference Architecture, new vertical-specific RAG model with Nvidia, and expanded partner set with ISVs including Run.AI and Weights & Biases Did.

Similarly, NetApp announced new RAG-focused services based on Nvidia NeMo Retriever microservices technology.

Analyst's view

There's still a lot to say about GTC, including liquid cooling solutions, infrastructure for inference, pushing AI to the edge, and even a clear trend toward AI for cybersecurity. However, all of these are built on the infrastructure that Nvidia provides in the cloud and through his OEM partners.

AI remains at the center of the technology world, but its impact is growing. Cloud providers are introducing increasingly rich solution stacks, but on-premises usage is also increasing. Inference is becoming increasingly important, increasing the need for AI infrastructure both on-premises and at the edge.

The impact of AI is far-reaching, but the required infrastructure is increasingly defined by a single company. Nvidia continues to take a platform-centric approach, going beyond GPUs to provide integrated system-level AI solutions. Beyond the new Blackwell accelerator, the Nvidia GB200 NVL72 system and corresponding SuperPOD solution demonstrate this focus.

Nvidia is leading the AI market with a strategy that is both precise and visionary. The company doesn't just sell chips. We are building an ecosystem that will help propel companies into the age of AI.

Disclosure: Steve McDowell is an industry analyst and NAND Research engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. An industry analyst firm. Mr. McDowell has no stock position in any company mentioned in this article.

follow me twitter Or LinkedIn.

Steve McDowell is a Principal Analyst at NAND Research. Steve is a technologist with over 25 years of deep industry experience in a variety of strategy, engineering, and strategic marketing roles, with a unifying theme of delivering innovative technology to the enterprise infrastructure market. .

What's Hot

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

AI infrastructure takes center stage at GTC 2024

Opinion | Loneliness is a problem that AI cannot solve

Why we need Gemini AI to improve the Assistant on Google's Nest speakers

Two simple rules to guide all your AI efforts

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

AI-powered SEO software market [2024-2031] Size, Trends, Sales, Revenue Forecasts HubSpot. Marketo. Oracle – Economica

AMD Ryzen AI CPU beats Intel Core Ultra in AI LLM and GenAI benchmarks, delivers lower power consumption and lower cost with XDNA

Microsoft investigates harmful AI-powered chatbot 'Copilot'

AnkerWork S600 review: An AI-powered speakerphone that actually works

Our Picks

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

Most Popular

OnlyFans creator dishes dirt on dating

Anya Taylor-Joy has big plans to rival Gwyneth Paltrow's £197m business Goop as she prepares to launch a lifestyle business

OnlyFans star suffers from online stalking by family member: 'It hurts my stomach'

Subscribe to Updates

What's Hot

AI infrastructure takes center stage at GTC 2024

Nvidia is a platform company

AI continues to be cloud-first

OEMs are ready for AI

Storage adapts to AI

Analyst's view

Related Posts