Abstract
Lightweight neural networks (LWNNs) primarily employ the bottleneck block (BB) introduced in MobileNetv2 or similar architectural structures. However, the channel expansion-reduction process in BB imposes substantial activation memory overhead, a challenge that has not been adequately addressed in prior studies on LWNN accelerators incorporating BB. To overcome this limitation, we propose a fully-pipelined bottleneck architecture (FPB) optimized for the efficient hardware deployment of BB. FPB eliminates the need for intermediate off-chip memory access, effectively addressing deployment challenges associated with BB and enabling an end-to-end accelerator architecture. To enhance hardware efficiency, each FPB core utilizes 2-LUT DSP, Fused-ReLU6, and Q-Residual, optimizing computational performance while minimizing resource consumption. Furthermore, we introduce a batching technique that maximizes the benefits of FPB by ensuring high hardware utilization across FPB cores while enabling the concurrent processing of multiple images. To mitigate the off-chip memory access latency inherently incurred by batching, we propose a stem layer latency hiding technique, which effectively prevents performance degradation. We evaluate the performance of our proposed MobileNetv2 accelerator on the VCU118 board, achieving an energy efficiency of 120.7 GOPS/W at a batch size of 4. This represents an improvement of 1.5x to 10.5x over prior work. Depending on the batch size configuration, our FAB accelerator achieves a throughput performance ranging from 204.2 GOPS to 772.7 GOPS, demonstrating its high computational efficiency.
| Original language | English |
|---|---|
| Pages (from-to) | 6615-6628 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Circuits and Systems I: Regular Papers |
| Volume | 72 |
| Issue number | 11 |
| DOIs | |
| State | Published - 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- Lightweight convolution neural network
- batching
- bottleneck block
- fully-pipelined bottleneck architecture
- high throughput
Fingerprint
Dive into the research topics of 'FAB: FPGA-Accelerated Fully-Pipelined Bottleneck Architecture With Batching for High-Performance MobileNetv2 Inference'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver