TY - JOUR
T1 - Dedicated FPGA Implementation of the Gaussian TinyYOLOv3 Accelerator
AU - Ki, Subin
AU - Park, Juntae
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2023/10/1
Y1 - 2023/10/1
N2 - This brief presents a dedicated FPGA implementation of the Gaussian TinyYOLOv3 accelerator using a streamline architecture for object detection in mobile and edge devices. The proposed accelerator employs a hardware-friendly shift-based floating-fixed MAC operator and shift-based quantization method that significantly reduces hardware resources and minimizes accuracy degradation. The pipelined streamline architecture maximizes hardware utilization and stores all parameters in on-chip memory to minimize external memory access. Moreover, the Gaussian modeling-based performance enhancement technique is effectively processed in the programmable system to address the low accuracy issue in lightweight models. The proposed IP implemented on Xilinx XCVU9P achieves a processing speed of 62.9 FPS and an accuracy of 34.01% on the COCO2014 dataset, which demonstrates the superiority of the proposed accelerator over prior research in terms of the trade-off between throughput, hardware resources, and model accuracy.
AB - This brief presents a dedicated FPGA implementation of the Gaussian TinyYOLOv3 accelerator using a streamline architecture for object detection in mobile and edge devices. The proposed accelerator employs a hardware-friendly shift-based floating-fixed MAC operator and shift-based quantization method that significantly reduces hardware resources and minimizes accuracy degradation. The pipelined streamline architecture maximizes hardware utilization and stores all parameters in on-chip memory to minimize external memory access. Moreover, the Gaussian modeling-based performance enhancement technique is effectively processed in the programmable system to address the low accuracy issue in lightweight models. The proposed IP implemented on Xilinx XCVU9P achieves a processing speed of 62.9 FPS and an accuracy of 34.01% on the COCO2014 dataset, which demonstrates the superiority of the proposed accelerator over prior research in terms of the trade-off between throughput, hardware resources, and model accuracy.
KW - Convolutional neural network (CNN)
KW - field-programmable gate array (FPGA)
KW - hardware accelerator
KW - object detection
KW - streamline architecture
KW - TinyYOLOv3
UR - http://www.scopus.com/inward/record.url?scp=85163455640&partnerID=8YFLogxK
U2 - 10.1109/TCSII.2023.3289514
DO - 10.1109/TCSII.2023.3289514
M3 - Article
AN - SCOPUS:85163455640
SN - 1549-7747
VL - 70
SP - 3882
EP - 3886
JO - IEEE Transactions on Circuits and Systems II: Express Briefs
JF - IEEE Transactions on Circuits and Systems II: Express Briefs
IS - 10
ER -