This article is more than 1 year old

Intel drops the deets on UK's Dawn AI supercomputer

Phase one packs 512 Xeons, 1,024 Ponte Vecchio GPUs. Phase two: 10x that

SC23 As the SC23 conference in Denver, USA, kicks off in earnest, Intel is spilling the tea on the two-phase Dawn supercomputer it's building for the UK with Dell and the University of Cambridge.

The chipmaker touted the system earlier this month during the UK's AI Summit, claiming it will be "the UK's fastest AI supercomputer."

Emphasis on AI, we think, because at 19 petaFLOPS of benchmarked FP64 performance, Dawn in its first phase only just about matches today's publicly known fastest UK supercomputer, Scotland's Archer2, which is currently ranked 39th in the world's publicly known Top500. Archer2 manages to top out at 26 petaFLOPS as a theoretical maximum, or 20 petaFLOPS in benchmarks.

So, Dawn right now isn't the fastest in Britain at FP64 in a practical sense. If you lower its precision to something like FP16 for AI work, then yes, its performance will in theory be higher, and it might therefore be the fastest AI machine in the nation (assuming Archer2 couldn't pull off the same feat if its operators so desired.) See below for more on that.

And it's still not clear if Intel thinks the first or second phase of Dawn will be the "fastest" in the UK at AI. The second phase is set to be ten-times as fast as the first part of Dawn. The first phase is supposed to be going online soon if not already; the second phase is due next year.

In a press briefing ahead of SC23, Intel execs said at least the first phase system would feature 512 4th-gen Xeon Scalable processors and 1,024 Datacenter GPU Max accelerators spread across 256 liquid-cooled Dell PowerEdge XE9640 systems.

Each node is equipped with 1TB of DDR5 memory and 512GB of high bandwidth memory. We've also learned each node will utilize four of Nvidia's Infiniband HDR200 interconnects.

While neither Intel or Dell have shared the details of what the second phase of the project will look like, it's supposed to, as we said, boost the system's capacity tenfold.

As it stands the first phase of the system is rated for a peak output of 53 petaFLOPS of double precision performance. However, in its first Linpack run, Dawn managed less than half that. At 19 petaFLOPS of real-world FP64 performance, the system comes in at 41st place in the global Top500.

Intel's peak performance claims would seem to indicate the chipmaker has managed to work out the kinks in its Ponte Vecchio GPUs, which on paper are good for about 52 teraFLOPS at FP64.

As our sibling publication The Next Platform pointed out earlier, the Ponte Vecchio parts delivered to Argonne National Lab for integration into the US-based system were only capable of delivering 31.5 teraFLOPS of FP64 performance — about 61 percent of what the datasheet claims.

We've asked Intel for clarification on the GPU Max 1550's performance; we'll let you know if we hear anything back.

This means that if and when Dawn's second phase is complete, its peak theoretical performance should be closer to 532 petaFLOPS at FP64. That would be a massive step up from the UK's Archer2.

If Intel and Dell can improve the efficiency of the system, Dawn's second phase should rank among the top 10 fastest supercomputers officially recorded, with performance in spitting distance from the Fugaku system, which is rated for 537 petaFLOPS of peak FP64 performance.

With that said, actual performance in the Linpack bench usually comes in a fair bit lower. While Fugaku is rated for 537 petaFLOPS of peak performance, in the real world it's closer to 442 petaFLOPS.

Further analysis

Intel's claim is that Dawn is the UK's "fastest AI supercomputer," and here's where things get a little interesting. Those GPU Max 1550s are good for 832 teraFLOPS of Brain Float 16 (BF16) math, according to Intel's datasheet. In its first phase, that put its AI performance at 852 petaFLOPS. Unless the claim is based on the chip's integer performance, in which case we're looking at 1.7 exaOPS of INT8. Fully built, the system will be closer to between 8.5 exaFLOPS of BF16 and 17 exaOPS of Int8.

Nvidia has made similar claims about the AI performance of the Isambard-AI supercomputer being deployed in collaboration with the University of Bristol, which will be comprised of 5,448 Nvidia GH200 Grace-Hopper Superchips. Those parts support nearly 4 petaFLOPS of sparse FP8 performance, putting its peak AI performance at about 21 exaFLOPs.

Compare pure BF16 performance and the completed Dawn system should come out ahead. But if your workload can leverage FP8, then the Isambard-AI machine is the one to beat.

Of course, all of these estimates assume that Dawn will eventually swell to 10,000-plus GPUs and that Intel is actually getting 52 teraFLOPS of FP64 performance from the accelerators. ®

More about

TIP US OFF

Send us news


Other stories you might like