EoCoE webinar: “The A64FX processor: Understanding streaming kernels and sparse matrix-vector multiplication”

For anyone who is interested in current processor architectures, this webinar may be of interest. It addresses the streaming kernels and sparse matrix-vector multiplication on the A64FX processor.

Attendance is free for everyone. Please register using the link given below.

Title: “The A64FX processor: Understanding streaming kernels and sparse matrix-vector multiplication”

Date: 2020-11-18, 10:00 a.m.


– Christie L. Alappat, Erlangen Regional Computing Center (RRZE), PhD student in the group of Prof. G. Wellein

– Dr. Georg Hager, Erlangen Regional Computing Center (RRZE), Senior researcher in the HPC division at RRZE

Registration URL: https://attendee.gotowebinar.com/register/3926945771611115789


The A64FX CPU powers the current #1 supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for such a new architecture requires a good understanding of its performance features. Using these features, the Erlangen Regional Computing Center (RRZE) team will detail how they construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops. They will describe how the machine model points to peculiarities in the microarchitecture to keep in mind when optimizing applications, and how, applying the ECM model to sparse matrix-vector multiplication (SpMV), they motivate why the CRS matrix storage format is inappropriate and how the SELL-C-sigma format can achieve bandwidth saturation for SpMV. In this context, they will also look into some code optimization strategies that are relevant for A64FX and compare SpMV performance with AMD Rome, Intel Cascade Lake and NVIDIA V100.