TY - JOUR
T1 - MCM-SR
T2 - Multiple Constant Multiplication-Based CNN Streaming Hardware Architecture for Super-Resolution
AU - Bae, Seung Hwan
AU - Lee, Hyuk Jae
AU - Kim, Hyun
N1 - Publisher Copyright:
© 1993-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Convolutional neural network (CNN)-based super-resolution (SR) methods have become prevalent in display devices due to their superior image quality. However, the significant computational demands of CNN-based SR require hardware accelerators for real-time processing. Among the hardware architectures, the streaming architecture can significantly reduce latency and power consumption by minimizing external dynamic random access memory (DRAM) access. Nevertheless, this architecture necessitates a considerable hardware area, as each layer needs a dedicated processing engine. Furthermore, achieving high hardware utilization in this architecture requires substantial design expertise. In this article, we propose methods to reduce the hardware resources of CNN-based SR accelerators by applying the multiple constant multiplication (MCM) algorithm. We propose a loop interchange method for the convolution (CONV) operation to reduce the logic area by 23% and an adaptive loop interchange method for each layer that considers both the static random access memory (SRAM) and logic area simultaneously to reduce the SRAM size by 15%. In addition, we improve the MCM graph exploration speed by 5.4× while maintaining the SR quality through beam search when CONV weights are approximated to reduce the hardware resources.
AB - Convolutional neural network (CNN)-based super-resolution (SR) methods have become prevalent in display devices due to their superior image quality. However, the significant computational demands of CNN-based SR require hardware accelerators for real-time processing. Among the hardware architectures, the streaming architecture can significantly reduce latency and power consumption by minimizing external dynamic random access memory (DRAM) access. Nevertheless, this architecture necessitates a considerable hardware area, as each layer needs a dedicated processing engine. Furthermore, achieving high hardware utilization in this architecture requires substantial design expertise. In this article, we propose methods to reduce the hardware resources of CNN-based SR accelerators by applying the multiple constant multiplication (MCM) algorithm. We propose a loop interchange method for the convolution (CONV) operation to reduce the logic area by 23% and an adaptive loop interchange method for each layer that considers both the static random access memory (SRAM) and logic area simultaneously to reduce the SRAM size by 15%. In addition, we improve the MCM graph exploration speed by 5.4× while maintaining the SR quality through beam search when CONV weights are approximated to reduce the hardware resources.
KW - Convolutional neural network (CNN)
KW - hardware accelerator
KW - multiple constant multiplication (MCM)
KW - streaming hardware architecture
KW - super-resolution (SR)
UR - http://www.scopus.com/inward/record.url?scp=85211472646&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2024.3504513
DO - 10.1109/TVLSI.2024.3504513
M3 - Article
AN - SCOPUS:85211472646
SN - 1063-8210
VL - 33
SP - 75
EP - 87
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IS - 1
ER -