A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

Beom Jin Kang, Hae In Lee, Seok Kyu Yoon, Young Chan Kim, Sang Beom Jeong, Seong Jun O, Hyun Kim

Research output: Contribution to journalReview articlepeer-review

3 Scopus citations

Abstract

Recently, transformer-based models have achieved remarkable success in various fields, such as computer vision, speech recognition, and natural language processing. However, transformer models require a substantially higher number of parameters and computational operations than conventional neural networks (e.g., recurrent neural networks, long-short-term memory, and convolutional neural networks). Transformer models are typically processed on graphics processing unit (GPU) platforms specialized for high-performance memory and parallel processing. However, the high power consumption of GPUs poses significant challenges for their deployment in edge device environments with limited battery capacity. To address these issues, research on using field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) to drive transformer models with low power consumption is underway. FPGAs offer a high level of flexibility, whereas ASICs are beneficial for optimizing throughput and power. Therefore, both platforms are highly suitable for efficiently optimizing matrix multiplication operations, constituting a significant portion of transformer models. In addition, FPGAs and ASICs consume less power than GPUs, making them ideal energy-efficient platforms. This study investigates and analyzes the model compression methods, various optimization techniques, and architectures of accelerators related to FPGA- and ASIC-based transformer designs. We expect this study to serve as a valuable guide for hardware research in the transformer field.

Original languageEnglish
Article number103247
JournalJournal of Systems Architecture
Volume155
DOIs
StatePublished - Oct 2024

Keywords

  • Application-specific integrated circuit (ASIC)
  • Field-programmable gate array (FPGA)
  • Hardware accelerator
  • Model compression
  • Pruning
  • Quantization
  • Transformer
  • Vision Transformer (ViT)

Fingerprint

Dive into the research topics of 'A survey of FPGA and ASIC designs for transformer inference acceleration and optimization'. Together they form a unique fingerprint.

Cite this