Three-port memory cell and array for in-memory computing

Marvell Asia Pte, Ltd.

2021

Online Patent

Zugriff:

View record in USPTO Patent Grants (Volltext)

Disclosed is a three-port static random access memory (3P-SRAM) that performs XNOR operations. The cell has a write port and first and second read ports. Read operations are enabled through either the first read port using a first read wordline and a common read bitline or the second read port using a second read wordline and the common read bitline. Read wordline activation is controlled such that only one read wordline is activated (i.e., receives a read pulse) at a time. As a result, a read operation through either read port effectively accomplishes an XNOR operation. Also disclosed is a memory array, which incorporates such cells and which performs XNOR-bitcount-compare functions. Since XNOR-bitcount-compare functions are used in XNOR-NET type binary neural networks (BNNs), the memory array can be employed for implementing such a BNN designed for improved performance, scalability, and manufacturability. Also disclosed is an in-memory computing method.

Titel:	Three-port memory cell and array for in-memory computing
Autor/in / Beteiligte Person:	Marvell Asia Pte, Ltd.
Link:	View record in USPTO Patent Grants (Volltext)
Veröffentlichung:	2021
Medientyp:	Patent
Sonstiges:	Nachgewiesen in: USPTO Patent Grants Sprachen: English Patent Number: 10964,362 Publication Date: March 30, 2021 Appl. No: 16/393997 Application Filed: April 25, 2019 Assignees: Marvell Asia Pte, Ltd. (Singapore, SG) Claim: 1. A memory cell comprising: a first storage node configured to store a bit of data; a first read pass-gate transistor and a first read pull-down transistor connected in series between a common read bitline and a first voltage rail, wherein a gate of the first read pass-gate transistor is connected to a first read wordline, and wherein a gate of the first read pull-down transistor is connected to the first storage node; a second storage node configured to store a one's complement of the bit of data; a second read pass-gate transistor and a second read pull-down transistor connected in series between the common read bitline and the first voltage rail, wherein a gate of the second read pass-gate transistor is connected to a second read wordline, and wherein a gate of the second read pull-down transistor is connected to the second storage node; a capacitor connected to the common read bitline and controlling a discharge rate of the common read bitline to prevent a voltage level on the common read bitline from fully discharging to a ground reference voltage level during stepping down of the voltage level on the common read bitline; a first write pass-gate transistor connected to the first storage node; and a second write pass-gate transistor connected to the second storage node, wherein a gate of the first write pass-gate transistor and a gate of the second write pass-gate transistor are connected to a same write wordline for concurrent write access to the first storage node and the second storage node. Claim: 2. The memory cell of claim 1 , wherein: during a read operation to accomplish an XNOR operation, the common read bitline is pre-charged and a read pulse is applied selectively to only one of the first read wordline and the second read wordline; and during the read pulse applied, digital inputs for the XNOR operation performed by the memory cell are logic states of the first read wordline and the bit of data is stored at the first storage node, and an output of the XNOR operation performed by the memory cell is determinable based on whether discharging of the voltage level on the common read bitline of the memory cell occurs. Claim: 3. The memory cell of claim 2 , wherein: when both the first read wordline and the bit of data have a same logic state, the discharging of the voltage level on the common read bitline of the memory cell occurs and indicates the output of the XNOR operation is a high logic state; and when the first read wordline and the bit of data have different logic states, the discharging of the voltage level on the common read bitline of the memory cell is prevented and indicates the output of the XNOR operation is a low logic state. Claim: 4. The memory cell of claim 2 , wherein: the memory cell is incorporated into a memory array configured for in-memory computing of binary neural network operations; and the logic states of the first read wordline and the bit of data correspond to a binary input value and a binary weight value. Claim: 5. The memory cell of claim 1 , further comprising: the first voltage rail implemented as a ground rail or a negative voltage rail; a second voltage rail implemented as a positive voltage rail; a pair of cross-coupled inverters connected between the first voltage rail and the second voltage rail and comprising a first inverter and a second inverter; and a pair of complementary bitlines comprising a first bitline and a second bitline, wherein the first inverter comprises the first storage node, the second inverter comprises the second storage node, the first write pass-gate transistor is connected between the first storage node and the first bitline, and the second write pass-gate transistor is connected between the second storage node and the second bitline. Claim: 6. A memory array comprising memory cells arranged in columns and rows of the memory array, wherein: each column of the memory array comprises a common read bitline, a voltage comparator connected to the common read bitline, and a capacitor connected to the common read bitline and controlling a discharge rate of the common read bitline to prevent an actual voltage level on the common read bitline from fully discharging to a ground reference voltage level during stepping down of the actual voltage level on the common read bitline; each row of the memory array comprises a first read wordline, and a second read wordline; and each of the memory cells comprises a first storage node configured to store a bit of data, a first read pass-gate transistor and a first read pull-down transistor connected in series between a first voltage rail and the common read bitline of one of the columns including the memory cell of the first storage node, a gate of the first read pass-gate transistor is connected to the first read wordline for a row including the memory cell of the first storage node, and a gate of the first read pull-down transistor is connected to the first storage node, a second storage node configured to store a one's complement of the bit of data, a second read pass-gate transistor and a second read pull-down transistor connected in series between the common read bitline of the one of the columns and the first voltage rail, a gate of the second read pass-gate transistor is connected to the second read wordline of the row, and a gate of the second read pull-down transistor is connected to the second storage node. Claim: 7. The memory array of claim 6 , further comprising bitline drivers and wordline drivers, wherein: during concurrent read operations of all of the memory cells in a selected one of the columns to accomplish concurrent XNOR operations, a bitline driver pre-charges the common read bitline and the capacitor of the selected one of the columns; the wordline drivers selectively apply read pulses to only one of the first read wordline and the second read wordline in each of the rows; following the read pulses, the voltage comparator of the selected one of the columns compares the actual voltage level on the common read bitline for the selected one of the columns to a reference voltage level and generates a digital output; the actual voltage level on the common read bitline is indicative of a bit count of results of the concurrent XNOR operations performed by the memory cells in the selected one of the columns; and the reference voltage level corresponds to a bit count threshold such that the digital output indicates whether the bit count is below the bit count threshold. Claim: 8. The memory array of claim 7 , wherein, for an XNOR operation performed by each of the memory cells in the selected one of the columns: digital inputs are provided as logic states of the first read wordline connected to one of the memory cells, and the bit of data stored at the first storage node of the one of the memory cells and an output of the XNOR operation performed by the one of the memory cells is determinable based on whether discharging of the actual voltage level on the common read bitline of the selected one of the columns of the one of the memory cells occurs. Claim: 9. The memory array of claim 8 , wherein: when both the first read wordline connected to the one of the memory cells and the bit of data have a same logic state, the discharging of the actual voltage level on the common read bitline of the one of the memory cells occurs and indicates the output of the XNOR operation of the one of the memory cells is a high logic state; and when the first read wordline, connected to the one of the memory cells, and the bit of data have different logic states, the discharging of the actual voltage level on the common read bitline of the one of the memory cells is prevented and indicates the output of the XNOR operation of the one of the memory cells is a low logic state. Claim: 10. The memory array of claim 6 , wherein: the memory array further comprises bitline drivers and wordline drivers; during concurrent read operations of the memory cells in the columns of the memory array and to accomplish concurrent XNOR operations, the bitline drivers pre-charge the common read bitlines and the capacitors, and the wordline drivers selectively apply read pulses to only one of the first read wordline and the second read wordline in each of the rows; following the read pulses, voltage comparators connected to different common read bitlines of different columns perform compare operations; each of the compare operations performed by one of the voltage comparators, connected to one of the common read bitlines in one of the columns, comprises comparing the actual voltage level on the one of the common read bitlines to a reference voltage level and outputting a digital output; the actual voltage level on the one of the common read bitlines is indicative of a bit count of results of the concurrent XNOR operations performed by the memory cells in the one of the columns; and the reference voltage level corresponds to a bit count threshold such that the digital output of the one of the voltage comparators for the one of the columns indicates whether the bit count is below the bit count threshold. Claim: 11. The memory array of claim 10 , wherein: the memory array is configured for in-memory parallel computing of binary neural network operations; and the concurrent XNOR operations performed by the memory cells in the columns of the memory array occur in one clock cycle with logic states of first read wordlines corresponding to binary input values from a first receptive field, with logic states of bits stored in the memory cells of the columns corresponding to binary weight values from different kernels associated with different features, respectively, and with digital outputs from the voltage comparators of the columns being inserted into feature maps for the different features at a same location corresponding to the particular receptive field. Claim: 12. The memory array of claim 11 , wherein the concurrent XNOR operations performed by the memory cells in the columns of the memory array are repeated during subsequent clock cycles such that: at each subsequent clock cycle logic states of the first read wordlines correspond to different binary input values from a second receptive field, wherein the second receptive field is different than the first receptive field; the logic states of the bits stored in the memory cells of the columns continue to correspond to binary weight values from the different kernels associated with the different features, respectively; and the digital outputs from the voltage comparators of the columns are inserted into the feature maps for the different features at a location corresponding to the second receptive field. Claim: 13. The memory array of claim 6 , wherein each of the memory cells further comprises: a pair of cross-coupled inverters connected between the first voltage rail and a second voltage rail and comprising a first inverter and a second inverter; a first write pass-gate transistor; and a second write pass-gate transistor, wherein the first voltage rail implemented as a ground rail or a negative voltage rail; the second voltage rail implemented as a positive voltage rail; the first inverter comprises the first storage node; the second inverter comprises the second storage node; the first write pass-gate transistor is connected between the first storage node and a first bitline of a pair of complementary bitlines for a corresponding one of the columns; the second write pass-gate transistor is connected between the second storage node and a second bitline of the pair of complementary bitlines; and a gate of the first write pass-gate transistor and a gate of the second write pass-gate transistor are both connected to a write wordline for a corresponding one of the rows. Claim: 14. The memory array of claim 6 , wherein: the memory cells are implemented in an active device layer on an integrated circuit chip; and the capacitors connected respectively to the common read bitlines, wherein the capacitors comprise metal-on-metal capacitors. Claim: 15. A method comprising: providing a memory array comprising memory cells arranged in columns and rows of the memory array, wherein each of the columns of the memory array comprise a common read bitline, a voltage comparator connected to the common read bitline, and a capacitor connected to the common read bitline, wherein each of the rows of the memory array comprise a first read wordline and a second read wordline, wherein each of the memory cells comprises a first storage node configured to store a bit of data, a first read pass-gate transistor and a first read pull-down transistor connected in series between the common read bitline for one of the columns including the memory cell and a first voltage rail, a second storage node configured to store a one's complement of the bit of data, and a second read pass-gate transistor and a second read pull-down transistor connected in series between the common read bitline for the one of the columns and the first voltage rail, wherein a gate of the first read pass-gate transistor and a gate of the first read pull-down transistor are connected to the first read wordline of one of the rows including the memory cell and to the first storage node, respectively, and wherein a gate of the second read pass-gate transistor and a gate of the second read pull-down transistor are connected to the second read wordline for the one of the rows and to the second storage node, respectively; pre-charging the common read bitline and the capacitor of at least one selected column and selectively applying read pulses to only one of the first read wordline and the second read wordline in each of the rows such that concurrent XNOR operations are performed by memory cells in the at least one selected column; controlling a discharge rate of the common read bitline via the capacitor to prevent an actual voltage level of the common read bitline from fully discharging to a ground reference voltage level during stepping down of the actual voltage level of the common read bitline; and performing a comparison of the actual voltage level on the common read bitline of the at least one selected column to a reference voltage level and outputting a digital output, wherein the actual voltage level on the common read bitline of the at least one selected column is indicative of a bit count of results of the concurrent XNOR operations performed by the memory cells in the at least one selected column, and wherein the reference voltage level corresponds to a bit count threshold such that the digital output indicates whether the bit count is below the bit count threshold. Claim: 16. The method of claim 15 , wherein, for an XNOR operation performed by one of the memory cells in the at least one selected column, digital inputs are provided as logic states of the first read wordline connected to the one of the memory cells, and the bit of data stored at the first storage node of the one of the memory cells; and an output of the XNOR operation performed by the one of the memory cells in the at least one selected column is determinable based on whether discharging of the actual voltage level on the common read bitline of the at least one selected column through the one of the memory cells occurs. Claim: 17. The method of claim 16 , wherein: when both the first read wordline connected to the one of the memory cells and the bit of data have a same logic state, the discharging of the actual voltage level on the common read bitline through the one of the memory cells occurs and indicates the output of the XNOR operation is a high logic state, and when the first read wordline connected to the one of the memory cells and the bit of data have different logic states, the discharging of the actual voltage level on the common read bitline through the one of the memory cells is prevented and indicates the output of the XNOR operation is a low logic state. Claim: 18. The method of claim 15 , wherein: the pre-charging further comprises pre-charging common read bitlines and capacitors of the columns of the memory array such that the read pulses selectively applied to only one of the first read wordline and the second read wordline in each of the rows causes the concurrent XNOR operations to be performed by the memory cells in the columns of the memory array; and the method further comprises concurrently performing comparisons of actual voltage levels on the common read bitlines of the columns to the reference voltage level and outputting digital outputs for each of the columns, respectively. Claim: 19. The method of claim 18 , wherein: the method enables in-memory computing of binary neural network operations; and the concurrent XNOR operations performed by the memory cells in the columns of the memory array occur in one clock cycle with logic states of first read wordlines corresponding to binary input values from a first receptive field, with logic states of bits stored in the memory cells of the columns corresponding to binary weight values from different kernels associated with different features, respectively, and with digital outputs from voltage comparators of the columns being inserted into different feature maps for the different features at a same location corresponding to the particular receptive field. Claim: 20. The method of claim 19 , wherein: the concurrent XNOR operations performed by the memory cells in the columns of the memory array are repeated during subsequent clock cycles such that at each subsequent clock cycle logic states of the first read wordlines correspond to different binary input values from a second receptive field different than the first receptive field; the logic states of the bits stored in the memory cells of the columns continue to correspond to binary weight values from the different kernels associated with the different features, respectively; and the digital outputs from the voltage comparators of the columns are inserted into the different feature maps for the different features at a location corresponding to the different receptive field. Patent References Cited: 7009871 March 2006 Kawasumi ; 9786359 October 2017 Liaw ; 10521229 December 2019 Shu ; 2014/0078817 March 2014 Bentum ; 2017/0286830 October 2017 El-Yaniv et al. ; 2018/0039886 February 2018 Umuroglu et al. ; 2018/0107925 April 2018 Choi et al. ; 2018/0158520 June 2018 Shu ; 2018/0315473 November 2018 Yu et al. Other References: Valavi et al, “A Mixed-Signal Binarized Convolutional-Neural-Network Accelerator Integrating Dense Weight Storage and Multiplication for Reduced Data Movement,” IEEE Symposium on VLSI Circuits, 2018, pp. 1-2. cited by applicant ; Biswas et al., “Cony-RAM: An Energy-Efficient SRAM with Embedded Convolution Computation for Low-Power CNN-Based Machine Learning Applications,” IEEE International Solid-State Circuits Conference, 2018, pp. 488-497. cited by applicant ; Khwa et al., “A 65nm 4Kb Algorithm-Dependent Computing-in-Memory SRAM Unit-Macro with 2.3ns and 55.8TOPS/W Fully Parallel Product-Sum Operation for Binary DNN Edge Processors,” IEEE International Solid-State Circuits Conference, pp. 496-497. cited by applicant ; Jiang et al., “XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks,” IEEE Symposium on VLSI Technology, 2018, pp. 1-2. cited by applicant ; Zhang et al., “A Machine-Learning Classifier Implemented in a Standard 6T SRAM Array,” IEEE Symposium on VLSI Circuits (VLSI—Circuits), 2016, pp. 1-2. cited by applicant ; Agrawal et al., “Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays,” ArXiv:1807.00343v2, 2018, pp. 1-10. cited by applicant ; Conti et al., “XNOR Neural Engine: A Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, pp. 1-11. cited by applicant ; Corbariaux et al., “Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1,” ArXiv:1602.02830v3, 2016, pp. 1-11. cited by applicant ; Anonymous Authors, “Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference,” ICLR, 2019, pp. 1-11. cited by applicant ; Jia et al., “A Microprocessor Implemented in 65nm CMOS with Configurable and Bit-Scalable Accelerator for Programmable In-memory Computing,” ArXiv:1811.04047, 2018, pp. 1-10. cited by applicant ; Kim et al., “Bitwise Neural Networks,” ArXiv:1601.06071v1, 2016, pp. 1-5. cited by applicant ; Li et al., “Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning,” ArXiv:1802.00904, 2018, pp. 1-7. cited by applicant ; Lin et al., “Towards Accurate Binary Convolutional Neural Network,” ArXiv:1711.11294v1, 2017, pp. 1-14. cited by applicant ; Gonugondla et al., “A 42pJ/Decision 3.12TOPS/W Robust In-Memory Machine Learning Classifier with On-Chip Training,” IEEE International Solid-State Circuits Conference, 2018, pp. 490-491. cited by applicant ; Wu et al., “Brain-Inspired Computing Exploiting Carbon Nanotube FETs and Resistive Ram: Hyperdimensional computing Case Study,” IEEE International Solid-State Circuits Conference, 2018, pp. 492-494. cited by applicant ; Chen et al., “A 65nm 1Mb Nonvolatile Computing-in-Memory ReRAM Macro with Sub-16ns Multiply-and-Accumulate for Binary DNN AI Edge Processors,” IEEE International Solid-State Circuits Conference, 2018, 494-496. cited by applicant ; Rastegari et al., “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” ArXiv:1603.05279, 2016, pp. 1-17. cited by applicant ; Rusci et al., “Design Automation for Binarized Neural Networks: A Quantum Leap Opportunity?” ArXiv:1712.01743v1, 2017, pp. 1-5. cited by applicant ; Tang et al., “How to Train a Compact Binary Neural Network with High Accuracy?” Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 2625-2631. cited by applicant ; Zhou et al., “DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients,” ArXiv:1606.06160v3, 2018, pp. 1-13. cited by applicant ; Zhuang et al., “Rethinking Binary Neural Network for Accurate Image Classification and Semantic Segmentation,” ArXiv:1811.10413v1, 2018, pp. 1-12. cited by applicant Primary Examiner: Yoha, Connie C

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.