Mixed Precision Quantization with Hardware-Friendly Activation Functions for Hybrid ViT Models

Beom Jin Kang, Da Hun Choi, Hyun Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

As hardware devices have advanced recently, various artificial intelligence tasks including convolutional neural networks (CNNs) have achieved high accuracy. Especially in computer vision tasks, vision transformer (ViT) based models have achieved unprecedented progress, and CNN + ViT hybrid models have also been proposed that take advantage of both CNNs and ViTs. However, the numerous parameters of hybrid ViTs are unsuitable for resource-constrained mobile/edge environments. In addition, the nonlinear activation functions in hybrid ViTs (e.g., GeLU and Swish) require more resources and computational cost compared to integer operation functions (e.g., ReLU) when using dedicated hardware accelerators. To address these issues, we propose a technique to efficiently compress the prominent hybrid ViT model, MobileViT, by applying the mixed precision quantization and the Shift-Swish activation function. Compressing the MobileViT-s, MobileViT-xs, and MobileViT-xxs models with the proposed method on the ImageNet dataset resulted in minimal accuracy drops of 0.41%, 0.18%, and 0.86%, respectively, while achieving effective quantization and activation function approximation at the average 7.9-bit level.

Original languageEnglish
Title of host publication2024 International Conference on Electronics, Information, and Communication, ICEIC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350371888
DOIs
StatePublished - 2024
Event2024 International Conference on Electronics, Information, and Communication, ICEIC 2024 - Taipei, Taiwan, Province of China
Duration: 28 Jan 202431 Jan 2024

Publication series

Name2024 International Conference on Electronics, Information, and Communication, ICEIC 2024

Conference

Conference2024 International Conference on Electronics, Information, and Communication, ICEIC 2024
Country/TerritoryTaiwan, Province of China
CityTaipei
Period28/01/2431/01/24

Keywords

  • Activation function
  • Deep learning
  • Mixed precision
  • Quantization
  • Vision Transformer

Fingerprint

Dive into the research topics of 'Mixed Precision Quantization with Hardware-Friendly Activation Functions for Hybrid ViT Models'. Together they form a unique fingerprint.

Cite this