TY - JOUR
T1 - Mobile-X
T2 - Dedicated FPGA Implementation of the MobileNet Accelerator Optimizing Depthwise Separable Convolution
AU - Hong, Hyeonseok
AU - Choi, Dahun
AU - Kim, Namjoon
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - MobileNet proposed depthwise separable convolution (DSC) as a replacement for standard convolution (SC), achieving significant reductions in parameters and computational complexity compared with traditional convolutional neural network (CNN) models. Recently, there has been a growing trend of deploying MobileNet on various edge devices by implementing accelerators. However, the distinctive computational patterns of depthwise convolution (DWC) and pointwise convolution (PWC) in MobileNet pose challenges for FPGA and ASIC accelerator implementations. In this brief, we propose DSC-dedicated processing engine (PE) designs specialized for DWC and PWC operations and an SC reordering module for only the first convolution layer. In addition, we introduce the pipeline DSC processing called pipelining separable convolution (PSC) and tiled-convolution (TC) techniques that consider the computational load of PWC. Our proposed 8-bit quantization in the accelerator causes only a negligible accuracy drop (i.e., 0.68%) compared with full precision, yet it enables hardware-friendly operations with only a single fixed-point multiplication. On the ZCU-102 platform, the proposed accelerator achieves 190.9 FPS and 108.3 GOPS using minimal hardware resources. Consequently, we achieve 18.20 GOPS/W, showing a 3.7× power efficiency compared to the A-100 GPU.
AB - MobileNet proposed depthwise separable convolution (DSC) as a replacement for standard convolution (SC), achieving significant reductions in parameters and computational complexity compared with traditional convolutional neural network (CNN) models. Recently, there has been a growing trend of deploying MobileNet on various edge devices by implementing accelerators. However, the distinctive computational patterns of depthwise convolution (DWC) and pointwise convolution (PWC) in MobileNet pose challenges for FPGA and ASIC accelerator implementations. In this brief, we propose DSC-dedicated processing engine (PE) designs specialized for DWC and PWC operations and an SC reordering module for only the first convolution layer. In addition, we introduce the pipeline DSC processing called pipelining separable convolution (PSC) and tiled-convolution (TC) techniques that consider the computational load of PWC. Our proposed 8-bit quantization in the accelerator causes only a negligible accuracy drop (i.e., 0.68%) compared with full precision, yet it enables hardware-friendly operations with only a single fixed-point multiplication. On the ZCU-102 platform, the proposed accelerator achieves 190.9 FPS and 108.3 GOPS using minimal hardware resources. Consequently, we achieve 18.20 GOPS/W, showing a 3.7× power efficiency compared to the A-100 GPU.
KW - Convolutional neural network (CNN)
KW - FPGA
KW - MobileNet
KW - hardware accelerator
UR - http://www.scopus.com/inward/record.url?scp=85200815271&partnerID=8YFLogxK
U2 - 10.1109/TCSII.2024.3440884
DO - 10.1109/TCSII.2024.3440884
M3 - Article
AN - SCOPUS:85200815271
SN - 1549-7747
VL - 71
SP - 4668
EP - 4672
JO - IEEE Transactions on Circuits and Systems II: Express Briefs
JF - IEEE Transactions on Circuits and Systems II: Express Briefs
IS - 11
ER -