Einzeltreffer — DigiBib

A compiler transforms a high-level program into configuration data for a coarse-grained reconfigurable (CGR) data processor with an array of CGR units. The compiler includes a method that identifies a skip buffer in a dataflow graph, determines limitations associated with the array, and searches for a lowest cost implementation topology and stage depth. At least three topologies are considered, including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology. The lowest cost implementation topology and stage depth are based on the size of the buffered data (usually, the size of a tensor), the depth of the skip buffer, and the array's limitations. The hybrid buffer topology includes multiple sections of parallel memory units. The data travels between memory units in one section to adjacent memory units in a next section without intervening reorder buffers.

Titel:	Skip Buffer Splitting
Link:	View record in USPTO Patent Applications (Volltext)
Veröffentlichung:	2023
Medientyp:	Patent
Sonstiges:	Nachgewiesen in: USPTO Patent Applications Sprachen: English Document Number: 20230385043 Publication Date: November 30, 2023 Appl. No: 17/944872 Application Filed: September 14, 2022 Assignees: SambaNova Systems, Inc. (Palo Alto, CA, US) Claim: 1. A computer-implemented method to transform a high-level program into configuration data for a coarse-grained reconfigurable (CGR) processor with an array of CGR units, comprising: transforming at least a part of the high-level program into a dataflow graph that includes multiple interdependent asynchronously performing meta-pipelines, wherein at least one of the meta-pipelines includes a nested loop; in the dataflow graph, identifying a first buffer that stores data that is passed from a producer in a first meta-pipeline stage to a consumer in a second meta-pipeline stage, wherein the first buffer has a first depth and the first depth is more than two timesteps; determining hardware limitations associated with the array of CGR units, including one or more of a number of bytes in a memory unit, a maximum depth of a buffer, or a maximum fan-in of a buffer; determining a lowest cost implementation topology and stage depth, based on a size of the data, the first depth, the hardware limitations, and one or more of three topologies, the three topologies including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology; assigning the first buffer to memory units and communication channels according to the lowest cost implementation topology and stage depth; generating configuration data for the assigned memory units and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium. Claim: 2. The computer-implemented method of claim 1, wherein: the hybrid buffer topology includes multiple sections that include parallel memory units; and the data travels from memory units in one to adjacent memory units in a next section without intervening reorder buffers. Claim: 3. The computer-implemented method of claim 1, wherein: determining a lowest cost implementation topology and stage depth includes calculating a cost based on a number of memory units and based on a number of times the data is written into a memory unit while traveling through the first buffer. Claim: 4. The computer-implemented method of claim 3, wherein the cost factor includes a weight for the number of memory units and/or a weight for the number of times the data is written into a memory unit while traveling through the first buffer. Claim: 5. A non-transitory computer-readable storage medium storing computer program instructions to transform a high-level program into configuration data for a CGR processor with an array of CGR units, wherein the computer program instructions, when executed on a processor, implement a method comprising: transforming at least a part of the high-level program into a dataflow graph that includes multiple interdependent asynchronously performing meta-pipelines, wherein at least one of the meta-pipelines includes a nested loop; in the dataflow graph, identifying a first buffer that stores data that is passed from a producer in a first meta-pipeline stage to a consumer in a second meta-pipeline stage, wherein the first buffer has a first depth and the first depth is more than two timesteps; determining hardware limitations associated with the array of CGR units, including one or more of a number of bytes in a memory unit, a maximum depth of a buffer, or a maximum fan-in of a buffer; determining a lowest cost implementation topology and stage depth, based on a size of the data, the first depth, the hardware limitations, and one or more of three topologies, the three topologies including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology; assigning the first buffer to memory units and communication channels according to the lowest cost implementation topology and stage depth; generating configuration data for the assigned memory units and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium. Claim: 6. A system including one or more processors coupled to a memory, the memory loaded with computer program instructions to transform a high-level program into configuration data for a CGR processor with an array of CGR units, wherein the computer program instructions, when executed on the one or more processors, implement actions comprising: transforming at least a part of the high-level program into a dataflow graph that includes multiple interdependent asynchronously performing meta-pipelines, wherein at least one of the meta-pipelines includes a nested loop; in the dataflow graph, identifying a first buffer that stores data that is passed from a producer in a first meta-pipeline stage to a consumer in a second meta-pipeline stage, wherein the first buffer has a first depth and the first depth is more than two timesteps; determining hardware limitations associated with the array of CGR units, including one or more of a number of bytes in a memory unit, a maximum depth of a buffer, or a maximum fan-in of a buffer; determining a lowest cost implementation topology and stage depth, based on a size of the data, the first depth, the hardware limitations, and one or more of three topologies, the three topologies including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology; assigning the first buffer to memory units and communication channels according to the lowest cost implementation topology and stage depth; generating configuration data for the assigned memory units and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium. Current International Class: 06; 06

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

Skip Buffer Splitting