Dedicated FPGA Implementation of the Gaussian TinyYOLOv3 Accelerator

Subin Ki, Juntae Park, Hyun Kim

Research output: Contribution to journalArticlepeer-review

25 Scopus citations

Abstract

This brief presents a dedicated FPGA implementation of the Gaussian TinyYOLOv3 accelerator using a streamline architecture for object detection in mobile and edge devices. The proposed accelerator employs a hardware-friendly shift-based floating-fixed MAC operator and shift-based quantization method that significantly reduces hardware resources and minimizes accuracy degradation. The pipelined streamline architecture maximizes hardware utilization and stores all parameters in on-chip memory to minimize external memory access. Moreover, the Gaussian modeling-based performance enhancement technique is effectively processed in the programmable system to address the low accuracy issue in lightweight models. The proposed IP implemented on Xilinx XCVU9P achieves a processing speed of 62.9 FPS and an accuracy of 34.01% on the COCO2014 dataset, which demonstrates the superiority of the proposed accelerator over prior research in terms of the trade-off between throughput, hardware resources, and model accuracy.

Original languageEnglish
Pages (from-to)3882-3886
Number of pages5
JournalIEEE Transactions on Circuits and Systems II: Express Briefs
Volume70
Issue number10
DOIs
StatePublished - 1 Oct 2023

Keywords

  • Convolutional neural network (CNN)
  • field-programmable gate array (FPGA)
  • hardware accelerator
  • object detection
  • streamline architecture
  • TinyYOLOv3

Fingerprint

Dive into the research topics of 'Dedicated FPGA Implementation of the Gaussian TinyYOLOv3 Accelerator'. Together they form a unique fingerprint.

Cite this