Zum Hauptinhalt springen

Skip Buffer Splitting

2023
Online Patent

Titel:
Skip Buffer Splitting
Link:
Veröffentlichung: 2023
Medientyp: Patent
Sonstiges:
  • Nachgewiesen in: USPTO Patent Applications
  • Sprachen: English
  • Document Number: 20230385043
  • Publication Date: November 30, 2023
  • Appl. No: 17/944872
  • Application Filed: September 14, 2022
  • Assignees: SambaNova Systems, Inc. (Palo Alto, CA, US)
  • Claim: 1. A computer-implemented method to transform a high-level program into configuration data for a coarse-grained reconfigurable (CGR) processor with an array of CGR units, comprising: transforming at least a part of the high-level program into a dataflow graph that includes multiple interdependent asynchronously performing meta-pipelines, wherein at least one of the meta-pipelines includes a nested loop; in the dataflow graph, identifying a first buffer that stores data that is passed from a producer in a first meta-pipeline stage to a consumer in a second meta-pipeline stage, wherein the first buffer has a first depth and the first depth is more than two timesteps; determining hardware limitations associated with the array of CGR units, including one or more of a number of bytes in a memory unit, a maximum depth of a buffer, or a maximum fan-in of a buffer; determining a lowest cost implementation topology and stage depth, based on a size of the data, the first depth, the hardware limitations, and one or more of three topologies, the three topologies including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology; assigning the first buffer to memory units and communication channels according to the lowest cost implementation topology and stage depth; generating configuration data for the assigned memory units and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium.
  • Claim: 2. The computer-implemented method of claim 1, wherein: the hybrid buffer topology includes multiple sections that include parallel memory units; and the data travels from memory units in one to adjacent memory units in a next section without intervening reorder buffers.
  • Claim: 3. The computer-implemented method of claim 1, wherein: determining a lowest cost implementation topology and stage depth includes calculating a cost based on a number of memory units and based on a number of times the data is written into a memory unit while traveling through the first buffer.
  • Claim: 4. The computer-implemented method of claim 3, wherein the cost factor includes a weight for the number of memory units and/or a weight for the number of times the data is written into a memory unit while traveling through the first buffer.
  • Claim: 5. A non-transitory computer-readable storage medium storing computer program instructions to transform a high-level program into configuration data for a CGR processor with an array of CGR units, wherein the computer program instructions, when executed on a processor, implement a method comprising: transforming at least a part of the high-level program into a dataflow graph that includes multiple interdependent asynchronously performing meta-pipelines, wherein at least one of the meta-pipelines includes a nested loop; in the dataflow graph, identifying a first buffer that stores data that is passed from a producer in a first meta-pipeline stage to a consumer in a second meta-pipeline stage, wherein the first buffer has a first depth and the first depth is more than two timesteps; determining hardware limitations associated with the array of CGR units, including one or more of a number of bytes in a memory unit, a maximum depth of a buffer, or a maximum fan-in of a buffer; determining a lowest cost implementation topology and stage depth, based on a size of the data, the first depth, the hardware limitations, and one or more of three topologies, the three topologies including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology; assigning the first buffer to memory units and communication channels according to the lowest cost implementation topology and stage depth; generating configuration data for the assigned memory units and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium.
  • Claim: 6. A system including one or more processors coupled to a memory, the memory loaded with computer program instructions to transform a high-level program into configuration data for a CGR processor with an array of CGR units, wherein the computer program instructions, when executed on the one or more processors, implement actions comprising: transforming at least a part of the high-level program into a dataflow graph that includes multiple interdependent asynchronously performing meta-pipelines, wherein at least one of the meta-pipelines includes a nested loop; in the dataflow graph, identifying a first buffer that stores data that is passed from a producer in a first meta-pipeline stage to a consumer in a second meta-pipeline stage, wherein the first buffer has a first depth and the first depth is more than two timesteps; determining hardware limitations associated with the array of CGR units, including one or more of a number of bytes in a memory unit, a maximum depth of a buffer, or a maximum fan-in of a buffer; determining a lowest cost implementation topology and stage depth, based on a size of the data, the first depth, the hardware limitations, and one or more of three topologies, the three topologies including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology; assigning the first buffer to memory units and communication channels according to the lowest cost implementation topology and stage depth; generating configuration data for the assigned memory units and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium.
  • Current International Class: 06; 06

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

oder
oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

oder
oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

xs 0 - 576
sm 576 - 768
md 768 - 992
lg 992 - 1200
xl 1200 - 1366
xxl 1366 -