TY - GEN
T1 - Architectural Design of 3D NAND Flash based Compute-in-Memory for Inference Engine
AU - Shim, Wonbo
AU - Jiang, Hongwu
AU - Peng, Xiaochen
AU - Yu, Shimeng
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/9/28
Y1 - 2020/9/28
N2 - 3D NAND Flash memory has been proposed as an attractive candidate of inference engine for deep neural network (DNN) owing to its ultra-high density and commercially matured fabrication technology. However, the peripheral circuits require to be modified to enable compute-in-memory (CIM) and the chip architectures need to be redesigned for an optimized dataflow. In this work, we present a design of 3D NAND-CIM accelerator based on the macro parameters from an industry-grade prototype chip. The DNN inference performance is evaluated using the DNN+ NeuroSim framework. To exploit the ultra-high density of 3D NAND Flash, both inputs and weights duplication strategies are introduced to improve the throughput. The benchmarking on a variety of VGG and ResNet networks was performed across technological candidates for CIM including SRAM, RRAM and 3D NAND. Compared to similar designs with SRAM or RRAM, the result shows that 3D NAND based CIM design can achieve not only 17-24% chip size but also 1.9-2.7 times more competitive energy efficiency for 8-bit precision inference.
AB - 3D NAND Flash memory has been proposed as an attractive candidate of inference engine for deep neural network (DNN) owing to its ultra-high density and commercially matured fabrication technology. However, the peripheral circuits require to be modified to enable compute-in-memory (CIM) and the chip architectures need to be redesigned for an optimized dataflow. In this work, we present a design of 3D NAND-CIM accelerator based on the macro parameters from an industry-grade prototype chip. The DNN inference performance is evaluated using the DNN+ NeuroSim framework. To exploit the ultra-high density of 3D NAND Flash, both inputs and weights duplication strategies are introduced to improve the throughput. The benchmarking on a variety of VGG and ResNet networks was performed across technological candidates for CIM including SRAM, RRAM and 3D NAND. Compared to similar designs with SRAM or RRAM, the result shows that 3D NAND based CIM design can achieve not only 17-24% chip size but also 1.9-2.7 times more competitive energy efficiency for 8-bit precision inference.
KW - 3D NAND Flash
KW - Deep neural network
KW - compute-in-memory
KW - hardware accelerator
UR - https://www.scopus.com/pages/publications/85100325239
U2 - 10.1145/3422575.3422779
DO - 10.1145/3422575.3422779
M3 - Conference contribution
AN - SCOPUS:85100325239
T3 - ACM International Conference Proceeding Series
SP - 77
EP - 85
BT - MEMSYS 2020 - Proceedings of the International Symposium on Memory Systems
PB - Association for Computing Machinery
T2 - 2020 International Symposium on Memory Systems, MEMSYS 2020
Y2 - 28 September 2020 through 1 October 2020
ER -