Demystifying GPU UVM Cost with Deep Runtime and Workload Analysis

Allen, Tyler ; Ge, Rong

In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021-05-01

Online unknown

Zugriff:

With GPUs becoming ubiquitous in HPC systems, NVIDIA’s Unified Virtual Memory (UVM) is being adopted as a measure to simplify porting of complex codes to GPU platforms by allowing demand paging between host and device memory without programmer specification. Much like its storage-based counterparts, UVM provides a great deal of added usability at the cost of performance due to the abstraction and fault-handling mechanisms. This is preventing HPC systems from being used efficiently and effectively and decreases the overall value of GPU-based systems.To mitigate the cost of page fault stall time, NVIDIA has introduced a prefetching mechanism to their UVM system. This prefetcher infers data ahead-of-time based on prior page fault history, hoping to satisfy faults before they occur. Such a prefetcher must be cleverly designed and efficient, as it operates under the constraints of a realtime system for providing effective service. Additionally, the workload is quite complex due to the parallel nature of GPU faults, as well as page fault serialization and fault source erasure within the driver. The current prefetching mechanism uses a density-prefetching algorithm to offset the side-effects of receiving page faults in parallel. While this prefetching can be very effective, it also has a negative impact on the performance of GPU oversubscription.In this paper, we provide a deep analysis of the overhead caused by UVM and the primary sources of this overhead. Additionally, we analyze the impact of NVIDIA’s prefetching and oversubscription in practice on different workloads, and correlate the performance to the driver implementation and prefetching mechanism. We provide design insights and improvement suggestions for hardware and middleware that would provide new avenues for performance gain.

Titel:	Demystifying GPU UVM Cost with Deep Runtime and Workload Analysis
Autor/in / Beteiligte Person:	Allen, Tyler ; Ge, Rong
Link:	View record in OpenAIRE (Volltext) https://doi.org/10.1109/ipdps49936.2021.00023
Zeitschrift:	2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021-05-01
Veröffentlichung:	IEEE, 2021
Medientyp:	unknown
DOI:	10.1109/ipdps49936.2021.00023
Schlagwort:	Page fault Computer science business.industry Embedded system Middleware Demand paging Serialization Virtual memory Overhead (computing) business Programmer Porting
Sonstiges:	Nachgewiesen in: OpenAIRE Rights: CLOSED

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.