Abstract
This brief presents a dedicated FPGA implementation of the Gaussian TinyYOLOv3 accelerator using a streamline architecture for object detection in mobile and edge devices. The proposed accelerator employs a hardware-friendly shift-based floating-fixed MAC operator and shift-based quantization method that significantly reduces hardware resources and minimizes accuracy degradation. The pipelined streamline architecture maximizes hardware utilization and stores all parameters in on-chip memory to minimize external memory access. Moreover, the Gaussian modeling-based performance enhancement technique is effectively processed in the programmable system to address the low accuracy issue in lightweight models. The proposed IP implemented on Xilinx XCVU9P achieves a processing speed of 62.9 FPS and an accuracy of 34.01% on the COCO2014 dataset, which demonstrates the superiority of the proposed accelerator over prior research in terms of the trade-off between throughput, hardware resources, and model accuracy.
| Original language | English |
|---|---|
| Pages (from-to) | 3882-3886 |
| Number of pages | 5 |
| Journal | IEEE Transactions on Circuits and Systems II: Express Briefs |
| Volume | 70 |
| Issue number | 10 |
| DOIs | |
| State | Published - 1 Oct 2023 |
Keywords
- Convolutional neural network (CNN)
- field-programmable gate array (FPGA)
- hardware accelerator
- object detection
- streamline architecture
- TinyYOLOv3