TY - JOUR
T1 - RADAR
T2 - An Efficient FPGA-based ResNet Accelerator with Data-aware Reordering of Processing Sequences
AU - Park, Juntae
AU - Choi, Dahun
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2025, Institute of Electronics Engineers of Korea. All rights reserved.
PY - 2025
Y1 - 2025
N2 - The deployment of compact convolutional neural network (CNN) models with skip connections on edge devices through dedicated hardware accelerators is increasingly prevalent. However, optimizing the use of limited on-chip memory (OCM) across multiple CNN layers, especially those with skip connections, remains a challenge. In this paper, we propose a novel CNN accelerator technique that reorders the computation sequence for each layer to maximize data reuse within the OCM, thereby minimizing DRAM access and improving the utilization of both the OCM and the convolution processor. Additionally, we introduce a shared buffer design that efficiently manages OCM usage across different layers, particularly those involving skip connections. Finally, we present a ResNet-18 accelerator IP, RADAR, implemented with the proposed technique on a Xilinx ZCU102 FPGA. RADAR achieves 64.9 GOPS/W and 446.9 GOPS while maintaining high accuracy, demonstrating significant improvements over prior works in terms of the trade-off between throughput, hardware resource efficiency, and model accuracy.
AB - The deployment of compact convolutional neural network (CNN) models with skip connections on edge devices through dedicated hardware accelerators is increasingly prevalent. However, optimizing the use of limited on-chip memory (OCM) across multiple CNN layers, especially those with skip connections, remains a challenge. In this paper, we propose a novel CNN accelerator technique that reorders the computation sequence for each layer to maximize data reuse within the OCM, thereby minimizing DRAM access and improving the utilization of both the OCM and the convolution processor. Additionally, we introduce a shared buffer design that efficiently manages OCM usage across different layers, particularly those involving skip connections. Finally, we present a ResNet-18 accelerator IP, RADAR, implemented with the proposed technique on a Xilinx ZCU102 FPGA. RADAR achieves 64.9 GOPS/W and 446.9 GOPS while maintaining high accuracy, demonstrating significant improvements over prior works in terms of the trade-off between throughput, hardware resource efficiency, and model accuracy.
KW - Index terms: Convolutional neural network (CNN)
KW - ResNet
KW - data reordering
KW - field-programmable gate array (FPGA)
KW - hardware accelerator
KW - skip connection
UR - https://www.scopus.com/pages/publications/105014538969
U2 - 10.5573/JSTS.2025.25.4.451
DO - 10.5573/JSTS.2025.25.4.451
M3 - Article
AN - SCOPUS:105014538969
SN - 1598-1657
VL - 25
SP - 451
EP - 458
JO - Journal of Semiconductor Technology and Science
JF - Journal of Semiconductor Technology and Science
IS - 4
ER -