2.1. Matrix multiplication¶
This tutorial shows how to implement matrix multiplication kernel (C = A x B) in tilus. We start with a naive kernel and, by adding one optimization per version, reach cuBLAS speeds on modern GPUs.
This tutorial shows how to implement matrix multiplication kernel (C = A x B) in tilus. We start with a naive kernel and, by adding one optimization per version, reach cuBLAS speeds on modern GPUs.