Dense Linear Algebra Algorithms on AMD GPUs

I’m excited to have recently published my first journal paper with ICL! You can check it out on HGPU. TL;DR: we port a math library to AMD GPUs and get much better performance than AMD’s existing vendor libraries

I wrote this paper with Ahmad Abdelfattah, Stan Tomov, and Jack Dongarra at the Innovative Computing Laboratory, while a research assistant. We used auto-translation tools to convert the source code (in addition to some manual changes) to port from the existing CUDA platform to the new HIP/ROCm platform.

While doing this, we also did performance tuning on algorithms, specifically aimed at improving performance on the new AMD hardware. There are a number of differences (such as warp size, memory hierarchy, and compiler optimizations) that allowed us to improve performance, bringing it a lot closer to the peak performance on these GPUs.


My paper was added to the OS hackathon resources page