Demystifying GPU UVM Cost with Deep Runtime and Workload Analysis
In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021-05-01
Online
unknown
Zugriff:
With GPUs becoming ubiquitous in HPC systems, NVIDIA’s Unified Virtual Memory (UVM) is being adopted as a measure to simplify porting of complex codes to GPU platforms by allowing demand paging between host and device memory without programmer specification. Much like its storage-based counterparts, UVM provides a great deal of added usability at the cost of performance due to the abstraction and fault-handling mechanisms. This is preventing HPC systems from being used efficiently and effectively and decreases the overall value of GPU-based systems.To mitigate the cost of page fault stall time, NVIDIA has introduced a prefetching mechanism to their UVM system. This prefetcher infers data ahead-of-time based on prior page fault history, hoping to satisfy faults before they occur. Such a prefetcher must be cleverly designed and efficient, as it operates under the constraints of a realtime system for providing effective service. Additionally, the workload is quite complex due to the parallel nature of GPU faults, as well as page fault serialization and fault source erasure within the driver. The current prefetching mechanism uses a density-prefetching algorithm to offset the side-effects of receiving page faults in parallel. While this prefetching can be very effective, it also has a negative impact on the performance of GPU oversubscription.In this paper, we provide a deep analysis of the overhead caused by UVM and the primary sources of this overhead. Additionally, we analyze the impact of NVIDIA’s prefetching and oversubscription in practice on different workloads, and correlate the performance to the driver implementation and prefetching mechanism. We provide design insights and improvement suggestions for hardware and middleware that would provide new avenues for performance gain.
Titel: |
Demystifying GPU UVM Cost with Deep Runtime and Workload Analysis
|
---|---|
Autor/in / Beteiligte Person: | Allen, Tyler ; Ge, Rong |
Link: | |
Zeitschrift: | 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021-05-01 |
Veröffentlichung: | IEEE, 2021 |
Medientyp: | unknown |
DOI: | 10.1109/ipdps49936.2021.00023 |
Schlagwort: |
|
Sonstiges: |
|