A high-throughput and power-efficient fpga implementation of yolo cnn for object detection

Duy Thanh Nguyen, Tuan Nghia Nguyen, Hyun Kim, Hyuk Jae Lee

Research output: Contribution to journalArticlepeer-review

415 Scopus citations

Abstract

Convolutional neural networks (CNNs) require numerous computations and external memory accesses. Frequent accesses to off-chip memory cause slow processing and large power dissipation. For real-time object detection with high throughput and power efficiency, this paper presents a Tera-OPS streaming hardware accelerator implementing a you-only-look-once (YOLO) CNN. The parameters of the YOLO CNN are retrained and quantized with the PASCAL VOC data set using binary weight and flexible low-bit activation. The binary weight enables storing the entire network model in block RAMs of a field-programmable gate array (FPGA) to reduce off-chip accesses aggressively and, thereby, achieve significant performance enhancement. In the proposed design, all convolutional layers are fully pipelined for enhanced hardware utilization. The input image is delivered to the accelerator line-by-line. Similarly, the output from the previous layer is transmitted to the next layer line-by-line. The intermediate data are fully reused across layers, thereby eliminating external memory accesses. The decreased dynamic random access memory (DRAM) accesses reduce DRAM power consumption. Furthermore, as the convolutional layers are fully parameterized, it is easy to scale up the network. In this streaming design, each convolution layer is mapped to a dedicated hardware block. Therefore, it outperforms the 'one-size-fits-all' designs in both performance and power efficiency. This CNN implemented using VC707 FPGA achieves a throughput of 1.877 tera operations per second (TOPS) at 200 MHz with batch processing while consuming 18.29 W of on-chip power, which shows the best power efficiency compared with the previous research. As for object detection accuracy, it achieves a mean average precision (mAP) of 64.16% for the PASCAL VOC 2007 data set that is only 2.63% lower than the mAP of the same YOLO network with full precision.

Original languageEnglish
Article number8678682
Pages (from-to)1861-1873
Number of pages13
JournalIEEE Transactions on Very Large Scale Integration (VLSI) Systems
Volume27
Issue number8
DOIs
StatePublished - Aug 2019

Keywords

  • Binary weight
  • low-precision quantization
  • object detection
  • streaming architecture
  • you-only-look-once (YOLO)

Fingerprint

Dive into the research topics of 'A high-throughput and power-efficient fpga implementation of yolo cnn for object detection'. Together they form a unique fingerprint.

Cite this