Einzeltreffer — DigiBib

A system and corresponding method perform large memory transaction (LMT) stores. The system comprises a processor associated with a data-processing width and a processor accelerator. The processor accelerator performs a LMT store of a data set to a coprocessor in response to an instruction from the processor targeting the coprocessor. The data set corresponds to the instruction. The LMT store includes storing data from the data set, atomically, to the coprocessor based on a LMT line (LMTLINE). The LMTLINE is wider than the data-processing width. The processor accelerator sends, to the processor, a response to the instruction. The response is based on completion of the LMT store of the data set in its entirety. The processor accelerator enables the processor to perform useful work in parallel with the LMT store, thereby improving processing performance of the processor.

Titel:	System and method for large memory transaction (LMT) stores
Autor/in / Beteiligte Person:	Marvell Asia Pte Ltd
Link:	View record in USPTO Patent Grants (Volltext)
Veröffentlichung:	2024
Medientyp:	Patent
Sonstiges:	Nachgewiesen in: USPTO Patent Grants Sprachen: English Patent Number: 11960,727 Publication Date: April 16, 2024 Appl. No: 17/937128 Application Filed: September 30, 2022 Assignees: Marvell Asia Pte Ltd (Singapore, SG) Claim: 1. A system comprising: a processor associated with a data-processing width; and a processor accelerator configured to perform a large memory transaction (LMT) store of a data set to a coprocessor in response to an instruction from the processor targeting the coprocessor, the data set corresponding to the instruction, the LMT store including storing data from the data set, atomically, to the coprocessor based on a LMT line (LMTLINE), the LMTLINE wider than the data-processing width, the processor accelerator further configured to send, to the processor, a response to the instruction, the response based on completion of the LMT store of the data set in its entirety. Claim: 2. The system of claim 1 , wherein the data set includes a plurality of LMT lines (LMTLINEs) of data and wherein the response is a single response to the instruction that is based on completion of the LMT store of the plurality of LMTLINES of data of the data set. Claim: 3. The system of claim 1 , wherein the processor is further configured to perform work in parallel with the LMT store performed by the processor accelerator. Claim: 4. The system of claim 1 , wherein the LMT store of the data set, in its entirety, appears, from the perspective of the processor, as having been performed via a single instruction. Claim: 5. The system of claim 1 , wherein the processor accelerator is employed by the processor, exclusively. Claim: 6. The system of claim 1 , wherein the system is a system-on-a-chip (SoC) and wherein the SoC includes the processor, processor accelerator, and coprocessor. Claim: 7. The system of claim 1 , wherein the storing includes iteratively storing data from the data set to the coprocessor, atomically, on a LMTLINE-by-LMTLINE basis. Claim: 8. The system of claim 1 , further comprising memory and wherein the processor is configured to store the data set, in its entirety, to the memory prior to issuing the instruction. Claim: 9. The system of claim 1 , further comprising memory and wherein the processor is configured to: map a contiguous region of the memory as cacheable memory; and store the data set, in its entirety, to the contiguous region of the memory. Claim: 10. The system of claim 1 , further comprising memory, wherein the data set is associated with a pairing of a physical function (PF) and a virtual function (VF), wherein the VF is associated with the PF, wherein the processor is configured to store the data set in a contiguous region of the memory, and wherein the contiguous region of the memory is associated with the pairing of the PF and VF. Claim: 11. The system of claim 1 , wherein the processor accelerator includes a LMT physical-address cache (LPC) and wherein the processor accelerator is further configured to store a physical address (PA) of a LMTLINE of data from the data set in the LPC in association with a PF identifier of a PF and a VF identifier of a VF, the VF associated with PF, the PF and VF associated with the data set. Claim: 12. The system of claim 1 , wherein the data set includes a plurality of LMT lines (LMTLINEs) of data and wherein the processor accelerator is further configured to determine a total number of LMTLINEs of the plurality of LMTLINEs of the data set based on the instruction. Claim: 13. The system of claim 1 , further comprising a last level cache (LLC) controller and an input/output bridge (JOB) and wherein the storing includes: issuing at least one read instruction to the LLC controller to request a LMTLINE of data from the data set; issuing a write command to the JOB; and sending the LMTLINE of data requested to the IOB for storing at the coprocessor, atomically, the sending based on a) receipt of the LMTLINE of data requested and b) receipt of an acknowledgement from the IOB to the write command issued. Claim: 14. The system of claim 13 , wherein the processor accelerator is further configured to perform the issuing of the at least one read instruction, issuing of the write command, and sending of the LMTLINE of data, iteratively, on a LMTLINE-by-LMTLINE basis. Claim: 15. The system of claim 13 , wherein the IOB includes a LMT store scheduling widget (LSW), wherein the LSW is configured to assign a LSW identifier (ID) to the LMT store for ordering the LMT store relative to another LMT store, wherein the acknowledgment includes the LSW ID, and wherein sending the LMTLINE of data to the IOB includes sending the LSW ID with the LMTLINE of data. Claim: 16. The system of claim 1 , further comprising a non-coherent fully associative cache, a first memory, and a second memory, wherein the processor accelerator is further configured to obtain a physical address (PA) at which the data set is stored in the first memory, wherein the PA is obtained via the non-coherent fully associative cache or the second memory, wherein the processor accelerator is further configured to issue at least one read instruction to request a LMTLINE of data from the data set, and wherein the at least one read instruction includes an address of the LMTLINE, the address based on the PA obtained. Claim: 17. The system of claim 1 , wherein the processor accelerator includes a plurality of engines, wherein an engine of the plurality of engines is configured to handle the LMT store and wherein another engine of the plurality of engines is configured to handle another LMT store of a different data set, different from the data set handled by the LMT store. Claim: 18. A method comprising: performing, by a processor accelerator, a large memory transaction (LMT) store of a data set to a coprocessor in response to an instruction from a processor targeting the coprocessor, the processor associated with a data-processing width, the data set corresponding to the instruction, the performing including storing data from the data set, atomically, to the coprocessor based on a LMT line (LMTLINE), the LMTLINE wider than the data-processing width; and sending, from the processor accelerator to the processor, a response to the instruction, the response based on completion of the LMT store of the data set in its entirety. Claim: 19. The method of claim 18 , wherein the data set includes a plurality of LMT lines (LMTLINEs) of data and wherein the response is a single response to the instruction that is based on completion of the LMT store of the plurality of LMTLINES of data of the data set. Claim: 20. The method of claim 18 , further comprising performing work, by the processor, in parallel with performing the LMT store by the processor accelerator. Claim: 21. The method of claim 18 , wherein the LMT store of the data set, in its entirety, appears, from the perspective of the processor, as having been performed via a single instruction. Claim: 22. The method of claim 18 , wherein the processor accelerator is employed by the processor, exclusively. Claim: 23. The method of claim 18 , wherein the processor, processor accelerator, and coprocessor are included in a system-on-a-chip (SoC). Claim: 24. The method of claim 18 , wherein the storing includes iteratively storing data from the data set to the coprocessor, atomically, on a LMTLINE-by-LMTLINE basis. Claim: 25. The method of claim 18 , further comprising, by the processor, storing the data set, in its entirety, to memory prior to issuing the instruction. Claim: 26. The method of claim 18 , further comprising: mapping, by the processor, a contiguous region of the memory as cacheable memory; and storing, by the processor, the data set, in its entirety, to the contiguous region of the memory. Claim: 27. The method of claim 18 , wherein the data set is associated with a pairing of a physical function (PF) and a virtual function (VF), wherein the VF is associated with the PF, wherein the method further comprises storing, by the processor, the data set in a contiguous region of memory, and wherein the contiguous region of the memory is associated with the pairing of the PF and VF. Claim: 28. The method of claim 18 , wherein the processor accelerator includes a LMT physical-address cache (LPC) and the method further comprises storing, by the processor accelerator, a physical address (PA) of a LMTLINE of data from the data set in the LPC in association with a PF identifier of a PF and a VF identifier of a VF, the VF associated with the PF, the PF and VF associated with the data set. Claim: 29. The method of claim 18 , wherein the data set includes a plurality of LMT lines (LMTLINEs) of data and wherein the method further comprises determining, by the processor accelerator, a total number of LMTLINEs of the plurality of LMTLINEs of the data set based on the instruction. Claim: 30. The method of claim 18 , wherein the storing includes: issuing at least one read instruction to a last level cache (LLC) controller to request a LMTLINE of data from the data set; issuing a write command to an input/output bridge (JOB); and sending the LMTLINE of data requested to the IOB for storing at the coprocessor, atomically, the sending based on a) receipt of the LMTLINE of data requested and b) receipt of an acknowledgement from the IOB to the write command issued. Claim: 31. The method of claim 30 , further comprising performing the issuing of the at least one read instruction, issuing of the write command, and sending of the LMTLINE of data, iteratively, on a LMTLINE-by-LMTLINE basis. Claim: 32. The method of claim 30 , wherein the IOB includes a LMT store scheduling widget (LSW) and wherein the method further comprises assigning, by the LSW, a LSW identifier (ID) to the LMT store for ordering the LMT store relative to another LMT store, wherein the acknowledgment includes the LSW ID, and wherein sending the LMTLINE of data to the IOB includes sending the LSW ID with the LMTLINE of data. Claim: 33. The method of claim 18 , further comprising: obtaining, by the processor accelerator, a PA at which the data set is stored in a first memory, the obtaining including obtaining the PA via a non-coherent fully associative cache or a second memory; and issuing, by the processor accelerator, at least one read instruction to request a LMTLINE of data from the data set, wherein the at least one read instruction includes an address of the LMTLINE, the address based on the PA obtained. Claim: 34. The method of claim 18 , further comprising: handling the LMT store by an engine of the plurality of engines; and handling another LMT store of a different data set, different from the data set handled by the LMT store, by another engine of the plurality of engines. Claim: 35. An apparatus comprising: means for performing a large memory transaction (LMT) store of a data set to a coprocessor in response to an instruction from a processor targeting the coprocessor, the data set corresponding to the instruction, the performing including storing data from the data set, atomically, to the coprocessor based on a LMT line (LMTLINE), the LMTLINE wider than a data-processing width; and means for sending, to the processor, a response to the instruction, the response based on completion of the LMT store of the data set in its entirety. Patent References Cited: 9501243 November 2016 Kessler ; 20150100747 April 2015 Kessler ; 20150261535 September 2015 Snyder, II ; 20170286113 October 2017 Shanbhogue Other References: “AMBA® 5 CHI;” Architecture Specification, 2020, Arm Limited. cited by applicant Primary Examiner: Baughman, William E. Attorney, Agent or Firm: Hamilton, Brook, Smith & Reynolds, P.C.

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

System and method for large memory transaction (LMT) stores