Italy's climate super computer, Cassandra, to combine HPC with AI

CPU-heavy big iron boasts Intel's HBM-packed Xeons and a tiny complement of Nvidia H100s

Boffins in Italy are about to get their hands on a supercomputer that will more than double the resources available to study the effects of climate change.

Dubbed Cassandra, the big iron is based on Lenovo's liquid cooled Neptune system architecture and is slated for deployment at the Euro-Mediterranean Center on Climate Change (CMCC) in Lecce, Italy.

Founded in 2005 as a joint effort by Italy's ministries of environment, finance, and agriculture and forestry, the CMCC is tasked with developing forecast models of the Earth and its oceans, predicting the course of climate change, and establishing policies to mitigate and adapt to the effects of a warming planet.

This work is currently being conducted on CMCC's Juno system, which was deployed in 2022 and is based on Intel's third-gen Xeon Scalable Platform backed by a complement of 20 Nvidia A100 GPUs. Despite spanning 170 nodes, the existing system isn't all that powerful – managing just 1.13 petaFLOPS of peak double precision performance.

With 180 nodes, CMCC's Cassandra is by no means a small system – at least not physically. However, the bulk of those nodes are CPU powered, so it's nowhere near as compute dense as you might expect from a GPU-accelerated cluster. The system's 20,160 Xeon Max cores and 26TB of HBM2e memory are expected to produce 1.2 petaFLOPS of FP64 performance when it comes online later this year.

By way of comparison, the Nvidia DGX SuperPOD that MITRE announced it was deploying this week contains just 32 servers but has an estimated peak performance of 17 petaFLOPS – showing just how much of the heavy lifting GPUs are responsible for in modern systems.

It's not always about FLOPS

However, as researchers at Los Alamos National Lab have previously pointed out, different workloads have different bottlenecks – more peak FLOPS doesn't necessarily translate into higher performance.

"Given the hoopla in the press around the 'fastest supercomputer in the world,' one might think we should buy computers with the most FLOPS," Gary Grider, who heads up LANL's HPC division, said of the Crossroads system, which was installed late last summer.

While Crossroads is saddled with a far darker purpose, it is a relevant comparison in this case. As far as we can tell, it is based on the same Xeon Max foundation as Cassandra.

Launched in early 2023, the Xeon Max is packed with up to 64GB of HBM, capable of feeding the chip's 56 cores with 1TB/sec of memory bandwidth – that's a lot for a CPU. Other chips, like Fujitsu's Arm-based A64FX, have previously featured on-package HBM, but Xeon Max marked the first time we'd seen it on an x86 part.

As is the case in many HPC and AI workloads, memory bandwidth remains a major performance bottleneck and HBM offers a substantial advantage over traditional DRAM modules – certainly for nuclear weapons sims. As our sibling site The Next Platform has discussed in the past, it also works for weather forecasting.

While the bulk of the Cassandra system will be CPU-based, and will likely target well established HPC workloads, that's not to say GPUs won't play a role.

CMCC plans to add a pair of GPU nodes totaling 16 H100 accelerators to Cassandra to accelerate AI workloads. While two nodes might not sound like much, it will actually add an additional petaFLOP of FP64 performance to the system. And for AI workloads, it'll add as much as 64 petaFLOPS at 8-bit precision.

The climate lab aims to run a variety of AI-based climate change simulations on this GPU partition.

Maybe we don't need all that precision after all

Meteorological and climate modeling have traditionally been considered HPC workloads, where double precision FP64 calculations are the gold standard.

However, over the past few years work at various research institutions and companies has shown that many of the calculations can actually be done at single (32-bit) or even half (16-bit) precision at far higher performance without compromising on accuracy – particularly when it comes to weather and climate.

For instance, the European Center for Medium Range Weather Forecasts has already demonstrated the benefits of 32-bit calculations in weather and climate modeling, while researchers at the University of Bristol managed to eke out a 3.6x speed boost by dropping down to lower precisions.

At the more extreme end of the spectrum, we have Nvidia's Earth-2 climate model. Announced at GTC this spring, the massive digital twin uses AI to accelerate high-resolution simulations of the climate down to two kilometers of resolution.

Earth-2 employs a diffusion model called CorrDiff, which Nvidia claims is capable of generating images at 12.5x higher resolution and 1,000x faster than current numerical models. Because of this, Taiwan is already eyeing the platform to improve its tsunami forecasting. 

It's not hard to imagine CMCC using its GPU partition to augment or accelerate traditional HPC simulations using AI in a similar fashion. ®

More about

TIP US OFF

Send us news


Other stories you might like