FAB: FPGA-Accelerated Fully-Pipelined Bottleneck Architecture With Batching for High-Performance MobileNetv2 Inference

Young Chan Kim, Nam Joon Kim, Hyun Kim

Research output: Contribution to journalArticlepeer-review

Abstract

Lightweight neural networks (LWNNs) primarily employ the bottleneck block (BB) introduced in MobileNetv2 or similar architectural structures. However, the channel expansion-reduction process in BB imposes substantial activation memory overhead, a challenge that has not been adequately addressed in prior studies on LWNN accelerators incorporating BB. To overcome this limitation, we propose a fully-pipelined bottleneck architecture (FPB) optimized for the efficient hardware deployment of BB. FPB eliminates the need for intermediate off-chip memory access, effectively addressing deployment challenges associated with BB and enabling an end-to-end accelerator architecture. To enhance hardware efficiency, each FPB core utilizes 2-LUT DSP, Fused-ReLU6, and Q-Residual, optimizing computational performance while minimizing resource consumption. Furthermore, we introduce a batching technique that maximizes the benefits of FPB by ensuring high hardware utilization across FPB cores while enabling the concurrent processing of multiple images. To mitigate the off-chip memory access latency inherently incurred by batching, we propose a stem layer latency hiding technique, which effectively prevents performance degradation. We evaluate the performance of our proposed MobileNetv2 accelerator on the VCU118 board, achieving an energy efficiency of 120.7 GOPS/W at a batch size of 4. This represents an improvement of 1.5x to 10.5x over prior work. Depending on the batch size configuration, our FAB accelerator achieves a throughput performance ranging from 204.2 GOPS to 772.7 GOPS, demonstrating its high computational efficiency.

Original languageEnglish
Pages (from-to)6615-6628
Number of pages14
JournalIEEE Transactions on Circuits and Systems I: Regular Papers
Volume72
Issue number11
DOIs
StatePublished - 2025

Keywords

  • Lightweight convolution neural network
  • batching
  • bottleneck block
  • fully-pipelined bottleneck architecture
  • high throughput

Fingerprint

Dive into the research topics of 'FAB: FPGA-Accelerated Fully-Pipelined Bottleneck Architecture With Batching for High-Performance MobileNetv2 Inference'. Together they form a unique fingerprint.

Cite this