Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM
2020
Online
Elektronische Ressource
We implemented and optimized matrix multiplications between dense and block-sparse matrices on CUDA. We leveraged TVM, a deep learning compiler, to explore the schedule space of the operation and generate efficient CUDA code. With the automatic parameter tuning in TVM, our cross-thread reduction based implementation achieved competitive or better performance compared with other state-of-the-art frameworks.
Titel: |
Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM
|
---|---|
Link: | |
Veröffentlichung: | 2020 |
Medientyp: | Elektronische Ressource |
Schlagwort: |
|
Sonstiges: |
|