This article is more than 1 year old
AWS unveils core-packed Graviton4 and beefier Trainium accelerators for AI
Also hedging its bets with a healthy dose of Nvidia chips too
Re:Invent On Tuesday Amazon Web Services has unveiled its next-gen Graviton4 CPUs and Trainium2 AI accelerators at its Re:Invent shindig, which it claims will deliver a healthy boost in performance and efficiency in machine learning.
Amazon showed off its latest custom-built Arm-compatible processor, unsurprisingly dubbed Graviton4. Since launching its first-gen Graviton CPUs in 2018 Amazon has seen healthy demand for its homemade processor family. To date, the cloud giant claims it's deployed more than two million Graviton chips, which are used by 50,000-plus customers across 150 instance types.
"Graviton4 marks the fourth-generation we've delivered in just five years and is the most powerful and energy efficient chip we have ever built for a broad range of workloads," said David Brown, VP of compute and networking at AWS.
The fourth generation of the design has a claimed 30 percent boost in compute performance, 50 percent higher core density, and 75 percent more memory bandwidth, compared to Graviton3. We'll note the latter was kind of a given, considering the higher core count and maturity of DDR5.
The chip sports up to 96 Arm-designed Neoverse V2 cores – each core has 2MB of L2 cache – and will be supported by 12 channels of DDR5 5600MT/s memory. Graviton4 also gains support for encrypted traffic for all of its physical hardware interfaces. Check out our pals at The Next Platform for further commentary and analysis regarding this processor.
Also at Re:Invent
Amazon, as usual, has announced a truckload of stuff for its annual cloud conference, held this year in Las Vegas. We covered its CodeWhisperer updates here; the launch of its WorkSpaces Thin Client here; SDKs for Rust and Kotlin right here; and its latest direction with AppFabric here along with a summary of other news.
In the meantime, here's some other bits and pieces you might want to know about:
- Amazon is teasing a Q chat-bot that lets you describe the cloud architecture you want, and it'll offer some AWS solutions to you, as well as generate content and other stuff. We have more on that here.
- It's also teasing safety guardrails for its Bedrock service that provides access to various AI models.
- Speaking of things in preview, there's also the Amazon Aurora Limitless Database that apparently offers "automated horizontal scaling to process millions of write transactions per second and manage petabytes of data in a single Aurora database."
- And, also in preview, Amazon Redshift ML can now ingest and output data in SUPER format and work with LLMs. Redshift also now has a bunch of zero-ETL integrations with AWS Databases.
- And Redshift has gained support for multidimensional data layout sort keys to boost database performance.
- Plus Amazon ElastiCache Serverless is now available, we're told.
You can find Amazon's roundup of its announcements here, and a big ol' list here.
To start, Graviton4 will be available in Amazon's memory-optimized R8g instances, which are tailored toward workloads like high-performance databases, in-memory caches, and big data analytics. These instances will support larger configurations with up to 3x more vCPUs and 3x more memory compared to the older Rg7 instances which topped out at 64 vCPUs and 512GB RAM.
Since Amazon only provides the max number of vCPUs for a given instance, it's hard to tell how they're actually getting to 192 vCPUs and 1.5TB of memory for these instances. We do know that Graviton3 supported a novel three-socket configuration backed by a single Nitro DPU.
If AWS is using a similar topology with Graviton4, that'd suggest these instances are using a trio 64 core chips. However it's also possible AWS is using a dual 96 core configuration. These instances are available in preview starting today, but it'll be a few months longer before they're generally available.
Trainium2 arrives with a thirst for LLMs
Alongside Graviton4, AWS also refreshed its Trainium AI accelerators. The e-commerce giant introduced its first training chip in 2020, alongside a partnership with Intel to deploy its Habana Gaudi accelerators.
With the debut of Trainium2 Amazon's focus is clearly on large language models (LLMs) and foundation models for generative AI applications like chatbots content generation. While details on the accelerator are still thin, Tranium2 is said to deliver 4x faster training performance than its predecessor while boasting 3x the memory capacity and 2x better efficiency.
This tells us that Trainium2 will offer 96GB of high-bandwidth memory. Calculating training performance, however, is a tad trickier as accelerator performance, memory and interconnect bandwidth, and floating point precision, and the size of the dataset, all factor into this metric.
- AWS previews AppFabric for productivity – pitched as AI-powered glue between apps
- AWS plays with Fire TV Cube, turns it into a thin client for cloudy desktops
- Server shipments to fall 20% this year, but AI means vendors still raking it in
- Alibaba shuts down quantum lab, donates it to university
These chips will be made available in bundles of 16 as part of Amazon's EC2 Trn2 instances. However, for larger workloads, this can be scaled out to up to 100,000 accelerators connected using Amazon's EC2 UltraClusters interconnect for a peak performance of 65 exaFLOPS.
That works out to 650 teraFLOPS for each accelerator, but at what precision these performance claims are being made isn't clear. If we had to guess, it's FP16 since first-gen Trainium was good for about 190 teraFLOPS. Factor in more, likely faster high bandwidth memory and that should get us pretty close to that 4x speed up.
We've asked AWS for clarification on these performance metrics. Whatever the case, Amazon claims at this scale customers could train a 300-billion parameter LLM in weeks versus months.
Hedging its bets
Just like the first generation of Trainium chips, AWS isn’t putting all of its eggs in one basket.
The cloud provider announced an expanded relationship with Nvidia to deploy its Grace-Hopper Superchips in clusters of up to 32. The configuration is the first of a new offering the chipmaker is calling GH200-NVL32.
The cloud provider also plans to offer new instances based on Nvidia’s latest generation of silicon, including the GH200, H200, L40S, and L4 accelerators, and will work to bring its DGX Cloud platform to AWS.
AWS and Nvidia have alspo revealed they’re working on an AI supercomputer using 16,384 GH200s that will be capable of delivering 65 exaFLOPS of FP8 performance for AI workloads. The system, dubbed Project Ceiba, is unique in that it’ll use Amazon’s Elastic Fabric Adapter interconnect rather than relying on Infiniband like we’ve seen with large scale deployments in Microsoft Azure.
And, of course, all of these will be supported by Nvidia's AI software suite. ®
More from Nvidia
The GPU giant also today announced a microservice for adding LLM chatbots to business applications, allowing people to query and access corporate data, generate summaries and other content, and perform tasks, all through natural-language conversations.
And Nvidia talked up BioNeMo, now available via AWS, as a generative AI system for performing drug discovery.