HyQ: Hardware-Friendly Post-Training Quantization for CNN-Transformer Hybrid Networks

Nam Joon Kim, Jongho Lee, Hyun Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Hybrid models that combine CNNs and ViTs have recently emerged as state-of-the-art computer vision models. To efficiently deploy these hybrid models on resource-constrained mobile/edge devices, quantization is emerging as a promising solution. However, post-training quantization (PTQ), which does not require retraining or labeled data, has not been extensively studied for hybrid models. In this study, we propose a novel PTQ technique specialized for CNN-transformer hybrid models by considering the hardware design of hybrid models on AI accelerators such as GPUs and FPGAs. First, we introduce quantization-aware distribution scaling to address the large outliers caused by inter-channel variance in convolution layers. Furthermore, in the transformer block, we propose approximating the integer-only softmax with a linear function. This approach allows us to avoid costly FP32/INT32 multiplications, resulting in more efficient computations. Experimental results show that the proposed quantization method with INT8 precision demonstrated a 0.39% accuracy drop compared with the FP32 baseline on MobileViT-s with the ImageNet-1k dataset. Furthermore, when implemented on the FPGA platform, the proposed linear softmax achieved significant resource savings, reducing the look-up table and flip-flop usage by 1.8 ∼ 2.1× and 1.3 ∼ 1.9×, respectively, compared with the existing second-order polynomial approximation. The code is available at https://github.com/IDSL-SeoulTech/HyQ.

Original languageEnglish
Title of host publicationProceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
EditorsKate Larson
PublisherInternational Joint Conferences on Artificial Intelligence
Pages4291-4299
Number of pages9
ISBN (Electronic)9781956792041
StatePublished - 2024
Event33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 - Jeju, Korea, Republic of
Duration: 3 Aug 20249 Aug 2024

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823

Conference

Conference33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
Country/TerritoryKorea, Republic of
CityJeju
Period3/08/249/08/24

Fingerprint

Dive into the research topics of 'HyQ: Hardware-Friendly Post-Training Quantization for CNN-Transformer Hybrid Networks'. Together they form a unique fingerprint.

Cite this