Intel Xe Architecture for HPC and Exascale Unvieled: Ponte Vecchio & OneAPI


    In-line with the tip from Videocardz, Intel today gave the first debrief regarding their upcoming Xe graphics architecture today. Most of it was focused on the Data Center and HPC space but Intel made it a point to convey that it will be catering to gamers and enthusiasts as well. However, from the look of things, GPGPU will be the primary focus. Broadly speaking, Intel’s Xe lineup will include three distinct microarchitectures, namely Intel Xe LP, Xe HP, and Xe HPC.

    LP will be for integrated graphics processors such as the preset UHD iGPUs, HP will be for gamers and enthusiasts while HPC will cater to the Data Center and Exascale spaces. The Lower power Xe chips will have a TDP in the sub-50W range while the high-performance parts will draw somewhere between 75W to 300W of power. Naturally, the HPC chips will have an even higher power requirement.

    HPC, Exascale, Xe Memory Fabric, Rambo Cache and 40x more FP64 Performance

    As already mentioned above, the HPC space was the main focus of today’s press conference. As per Raja Koduri, Intel’s Xe GPUs will offer as much 40x better double-precision (FP64) compute performance. This incredible feat will be achieved using three core technologies:

    Scalability: Intel plans to build systems with multiple GPUs working in tandem resulting in Compute Units in excess of a grand. This will deliver never before seen levels of FP64 compute performance crucial in HPC and Data Center workloads.

    Xe Memory Fabric (XEMF): Intel is taking a page from AMD’s rulebook here and connecting these CUs (or maybe even GPUs) to a new scalable memory fabric (not Infinity Fabric) dubbed XEMF. I’m not sure if it will be used to connect CUs or just the memory to them or both. CXL will also be used in these machines to synchronize the CPUs and GPUs.

    At the heart of Xe architecture, we have a new fabric called XEMF. It is the heart of the performance of these machines. We called it the Rambo Cache. It is a unified cache that is accessible to CPU and GPU memory.

    Raja Koduri

    Rambo Cache: To make up for the latency penalty induced by the Xe Memory Fabric, these GPUs will also include large capacities of unified cache known as Rambo Cache. This cache will be arranged using the Foveros packaging technology

    There’s also mention of HBM memory which will be paired alongside these GPUs for maximum bandwidth. As Foveros connects the Rambo Cache, EMIB will be used to connect the HBM memory to the GPUs.

    These Ponte Vecchio GPUs will be paired with Sapphire Rapids based CPUs, both based on 7nm node and the resultant SuperComputer is being called Aurora. It will feature all of Intel’s technologies from Optane to Foveros to Xe as well as the interconnects such as CXL and EMIB in one machine. Each node will consist of two Sapphire Rapids CPU and six Ponte Vecchio GPUs running in sync, connected by various fabrics and interconnects. You’re looking at XEMF connecting the GPUs (aided by the Rambo cache and Foveros) and CPUs and the EMIB connecting the HBM memory to the GPUs.

    This Aurora SuperComputer will land in 2021 along with the OneAPI and support for both SIMT (GPU) and SIMD (CPU) vector widths, making it one of the most versatile GPGPU machines till date. Unlike NVIDIA and AMD, Intel is directly diving into the MCM design with its GPUs. This means that the multi-GPU driver spotted in the Linux kernel isn’t the same as SLI or XFX. How exactly will these GPUs work and how it will affect the consumer graphics market remains to be seen.

    Leave a Reply