TY - JOUR
T1 - A survey of FPGA and ASIC designs for transformer inference acceleration and optimization
AU - Kang, Beom Jin
AU - Lee, Hae In
AU - Yoon, Seok Kyu
AU - Kim, Young Chan
AU - Jeong, Sang Beom
AU - O, Seong Jun
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/10
Y1 - 2024/10
N2 - Recently, transformer-based models have achieved remarkable success in various fields, such as computer vision, speech recognition, and natural language processing. However, transformer models require a substantially higher number of parameters and computational operations than conventional neural networks (e.g., recurrent neural networks, long-short-term memory, and convolutional neural networks). Transformer models are typically processed on graphics processing unit (GPU) platforms specialized for high-performance memory and parallel processing. However, the high power consumption of GPUs poses significant challenges for their deployment in edge device environments with limited battery capacity. To address these issues, research on using field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) to drive transformer models with low power consumption is underway. FPGAs offer a high level of flexibility, whereas ASICs are beneficial for optimizing throughput and power. Therefore, both platforms are highly suitable for efficiently optimizing matrix multiplication operations, constituting a significant portion of transformer models. In addition, FPGAs and ASICs consume less power than GPUs, making them ideal energy-efficient platforms. This study investigates and analyzes the model compression methods, various optimization techniques, and architectures of accelerators related to FPGA- and ASIC-based transformer designs. We expect this study to serve as a valuable guide for hardware research in the transformer field.
AB - Recently, transformer-based models have achieved remarkable success in various fields, such as computer vision, speech recognition, and natural language processing. However, transformer models require a substantially higher number of parameters and computational operations than conventional neural networks (e.g., recurrent neural networks, long-short-term memory, and convolutional neural networks). Transformer models are typically processed on graphics processing unit (GPU) platforms specialized for high-performance memory and parallel processing. However, the high power consumption of GPUs poses significant challenges for their deployment in edge device environments with limited battery capacity. To address these issues, research on using field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) to drive transformer models with low power consumption is underway. FPGAs offer a high level of flexibility, whereas ASICs are beneficial for optimizing throughput and power. Therefore, both platforms are highly suitable for efficiently optimizing matrix multiplication operations, constituting a significant portion of transformer models. In addition, FPGAs and ASICs consume less power than GPUs, making them ideal energy-efficient platforms. This study investigates and analyzes the model compression methods, various optimization techniques, and architectures of accelerators related to FPGA- and ASIC-based transformer designs. We expect this study to serve as a valuable guide for hardware research in the transformer field.
KW - Application-specific integrated circuit (ASIC)
KW - Field-programmable gate array (FPGA)
KW - Hardware accelerator
KW - Model compression
KW - Pruning
KW - Quantization
KW - Transformer
KW - Vision Transformer (ViT)
UR - http://www.scopus.com/inward/record.url?scp=85200629766&partnerID=8YFLogxK
U2 - 10.1016/j.sysarc.2024.103247
DO - 10.1016/j.sysarc.2024.103247
M3 - Review article
AN - SCOPUS:85200629766
SN - 1383-7621
VL - 155
JO - Journal of Systems Architecture
JF - Journal of Systems Architecture
M1 - 103247
ER -