TY - GEN
T1 - Hardware-friendly Activation Functions for HybridViT Models
AU - Kang, Beom Jin
AU - Kim, Nam Joon
AU - Lee, Jong Ho
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In recent years, CNN+ ViT hybrid models have shown promising performance in computer vision tasks. To implement the CNN+ViT Hybrid model in resource-limited devices, various studies have been ongoing to address issues of parameter size and computational complexity through quantization, aiming to enable hardware-friendly low-bit integer operations. However, commonly used ViT activation functions (e.g., GeLU, Swish) inevitably require floating-point operations. To address this problem, some studies have been conducted to approximate these functions with alternatives that allow integer operations. Inspired by the Shift-GeLU approach, which approximates the GeLU function to enable integer operations, we propose and evaluate the Shift-Swish function on the MobileViT model at both software and hardware levels. Experimental results show that the hardware-level RTL design of the proposed method can reduce LUT by 63.25 %, FF usage by 87.69 %, and power consumption by 46.57 % with a minimum accuracy drop of 0.6 % compared to the baseline.
AB - In recent years, CNN+ ViT hybrid models have shown promising performance in computer vision tasks. To implement the CNN+ViT Hybrid model in resource-limited devices, various studies have been ongoing to address issues of parameter size and computational complexity through quantization, aiming to enable hardware-friendly low-bit integer operations. However, commonly used ViT activation functions (e.g., GeLU, Swish) inevitably require floating-point operations. To address this problem, some studies have been conducted to approximate these functions with alternatives that allow integer operations. Inspired by the Shift-GeLU approach, which approximates the GeLU function to enable integer operations, we propose and evaluate the Shift-Swish function on the MobileViT model at both software and hardware levels. Experimental results show that the hardware-level RTL design of the proposed method can reduce LUT by 63.25 %, FF usage by 87.69 %, and power consumption by 46.57 % with a minimum accuracy drop of 0.6 % compared to the baseline.
KW - Activation function
KW - Convolutional neural network
KW - Quantization
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=85184828632&partnerID=8YFLogxK
U2 - 10.1109/ISOCC59558.2023.10396294
DO - 10.1109/ISOCC59558.2023.10396294
M3 - Conference contribution
AN - SCOPUS:85184828632
T3 - Proceedings - International SoC Design Conference 2023, ISOCC 2023
SP - 147
EP - 148
BT - Proceedings - International SoC Design Conference 2023, ISOCC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th International SoC Design Conference, ISOCC 2023
Y2 - 25 October 2023 through 28 October 2023
ER -