TY - GEN
T1 - Effective zero compression on ReRAM-based sparse DNN accelerators
AU - Shin, Hoon
AU - Park, Rihae
AU - Lee, Seung Yul
AU - Park, Yeonhong
AU - Lee, Hyunseung
AU - Lee, Jae W.
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/7/10
Y1 - 2022/7/10
N2 - For efficient DNN inference Resistive RAM (ReRAM) crossbars have emerged as a promising building block to compute matrix multiplication in an area- and power-efficient manner. To improve inference throughput sparse models can be deployed on the ReRAM-based DNN accelerator. While unstructured pruning maintains both high accuracy and high sparsity, it performs poorly on the crossbar architecture due to the irregular locations of pruned weights. Meanwhile, due to the non-ideality of ReRAM cells and the high cost of ADCs, matrix multiplication is usually performed at a fine granularity, called Operation Unit (OU), along both wordline and bitline dimensions. While fine-grained, OU- based row compression (ORC) has recently been proposed to increase weight compression ratio, significant performance potentials are still left on the table due to sub-optimal weight mappings. Thus, we propose a novel weight mapping scheme that effectively clusters zero weights via OU-level filter reordering, hence improving the effective weight compression ratio. We also introduce a weight recovery scheme to further improve accuracy or compression ratio, or both. Our evaluation with three popular DNNs demonstrates that the proposed scheme effectively eliminates redundant weights in the crossbar array and hence ineffectual computation to achieve 3.27 - 4.26× of array compression ratio with negligible accuracy loss over the baseline ReRAM-based DNN accelerator.
AB - For efficient DNN inference Resistive RAM (ReRAM) crossbars have emerged as a promising building block to compute matrix multiplication in an area- and power-efficient manner. To improve inference throughput sparse models can be deployed on the ReRAM-based DNN accelerator. While unstructured pruning maintains both high accuracy and high sparsity, it performs poorly on the crossbar architecture due to the irregular locations of pruned weights. Meanwhile, due to the non-ideality of ReRAM cells and the high cost of ADCs, matrix multiplication is usually performed at a fine granularity, called Operation Unit (OU), along both wordline and bitline dimensions. While fine-grained, OU- based row compression (ORC) has recently been proposed to increase weight compression ratio, significant performance potentials are still left on the table due to sub-optimal weight mappings. Thus, we propose a novel weight mapping scheme that effectively clusters zero weights via OU-level filter reordering, hence improving the effective weight compression ratio. We also introduce a weight recovery scheme to further improve accuracy or compression ratio, or both. Our evaluation with three popular DNNs demonstrates that the proposed scheme effectively eliminates redundant weights in the crossbar array and hence ineffectual computation to achieve 3.27 - 4.26× of array compression ratio with negligible accuracy loss over the baseline ReRAM-based DNN accelerator.
UR - http://www.scopus.com/inward/record.url?scp=85137419572&partnerID=8YFLogxK
U2 - 10.1145/3489517.3530564
DO - 10.1145/3489517.3530564
M3 - Conference contribution
AN - SCOPUS:85137419572
T3 - Proceedings - Design Automation Conference
SP - 949
EP - 954
BT - Proceedings of the 59th ACM/IEEE Design Automation Conference, DAC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 59th ACM/IEEE Design Automation Conference, DAC 2022
Y2 - 10 July 2022 through 14 July 2022
ER -