Nvidia CEO, Jenson Huang announces the next step for AI with the GH200 chip.
Nvidia unveiled their new Grace Hopper (GH200) chip during Siggraph 2023 on Tuesday, specifically crafted for powering artificial intelligence (AI) models. This move comes in response to mounting competition in the AI hardware arena from rivals such as AMD, Google, and Amazon.
At present, Nvidia stands as the dominant force in the AI chip market, capturing more than 80% of the market share by some estimates. The company’s expertise lies in graphics processing units (GPUs), which have emerged as the preferred choice for running large AI models that drive generative AI software like Google’s Bard and OpenAI’s ChatGPT. However, Nvidia’s GPU supply is running thin due to the intense demand from tech giants, cloud providers, and startups striving to develop their own AI models.
Nvidia’s freshly unveiled GH200 is engineered with a focus on inference due to its expanded memory capacity, which accommodates more substantial AI models on a single system. Nvidia’s VP, Ian Buck, conveyed during a discussion with analysts and reporters on Tuesday that the H100 boasts 80GB of memory, in contrast to the GH200’s 141GB. Furthermore, Nvidia has introduced a system that merges two GH200 chips into one computer, catering to even larger models.
The newly introduced chip inherits the same GPU found in Nvidia’s top-tier AI chip, the H100. What sets the GH200 apart is its coupling of this GPU with an impressive 141 gigabytes of state-of-the-art memory, alongside a potent 72-core ARM central processor.
During a conference talk on Tuesday, Nvidia CEO Jensen Huang stated, “We’re amplifying the capabilities of this processor.” He went on to emphasize, “This processor has been crafted for scaling up the operations of global data centers.”
Nvidia plans to make the GH200 chip available through its distributors in the second quarter of the upcoming year, with sampling anticipated to commence by the year’s end. The pricing details, however, have not been disclosed by Nvidia representatives.
During Huang’s presentation on the mainstage of Siggraph, he called the GH200 chip the “iPhone moment of AI,“ and talked about why this was important saying, “This is the reason why the world’s data centers are rapidly transitioning to accelerated computing,” Huang told the audience. “The more you buy, the more you save.”
Typically, working with AI models involves two main phases: training and inference. In the training phase, models are trained using vast datasets, a process that can stretch over months and sometimes requires thousands of GPUs, such as Nvidia’s H100 and A100 chips.
Following training, the model is employed in software to make predictions or generate content using inference. Similar to training, inference is computationally intensive and demands substantial processing power each time the software executes, such as when it generates text or images. Unlike training, inference is an ongoing process, whereas training is only needed when the model requires updates.
According to Huang, “You can take virtually any large language model and integrate it into this chip, and it will perform inference at an exceptional pace. The cost of inference for extensive language models will witness a significant reduction.”
Huang also announced four new Omniverse Cloud APIs built by NVIDIA for developers to more seamlessly implement and deploy OpenUSD pipelines and applications.
- ChatUSD — Assisting developers and artists working with OpenUSD data and scenes, ChatUSD is a large language model (LLM) agent for generating Python-USD code scripts from text and answering USD knowledge questions.
- RunUSD — a cloud API that translates OpenUSD files into fully path-traced rendered images by checking compatibility of the uploaded files against versions of OpenUSD releases, and generating renders with Omniverse Cloud.
- DeepSearch — an LLM agent enabling fast semantic search through massive databases of untagged assets.
- USD-GDN Publisher — a one-click service that enables enterprises and software makers to publish high-fidelity, OpenUSD-based experiences to the Omniverse Cloud Graphics Delivery Network (GDN) from an Omniverse-based application such as USD Composer, as well as stream in real time to web browsers and mobile devices.
This announcement surfaces amidst AMD, Nvidia’s primary GPU competitor, introducing its own AI-oriented chip, the MI300X, capable of supporting 192GB of memory, and being marketed for its prowess in AI inference. Concurrently, companies like Google and Amazon are diligently crafting their personalized AI chips tailored for inference tasks.