GradQ-ViT: Robust and Efficient Gradient Quantization for Vision Transformers

Dahun Choi, Hyun Kim

Research output: Contribution to journalConference articlepeer-review

Abstract

Advancements in hardware accelerators, such as graphics processing units and neural processing units, have significantly propelled computer vision research. The vision transformer (ViT), leveraging the multi-head self-attention (MHSA) mechanism, has surpassed convolutional neural networks (CNNs) in accuracy but faces challenges in mobile and edge deployment due to its large size and computational demands. In addition, as privacy concerns push for on-device training, research on quantization methods for ViTs, particularly gradient quantization, has gained attention. Unlike CNNs, ViTs face challenges due to outliers and a complex loss landscape. To address this, we propose a gradient quantization framework that stabilizes training by adapting quantization points based on interquartile ranges and constructing an outlier-robust loss function. Additionally, we employ a scaling method to align quantized gradients with original gradients and adaptively assign the learning rate based on quantization error analysis. When quantizing weights, activations, and gradients to INT8, our method improves performance by 0.52% and 0.21% over DeiT-Base and Swin-Base, respectively, and achieves near parity with MobileViT-S with only a 0.09% accuracy drop. Furthermore, a 2.06× speedup was observed when applying our framework to MobileViT in a CUDA 11.8 environment.

Original languageEnglish
Pages (from-to)16019-16027
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume39
Issue number15
DOIs
StatePublished - 11 Apr 2025
Event39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: 25 Feb 20254 Mar 2025

Fingerprint

Dive into the research topics of 'GradQ-ViT: Robust and Efficient Gradient Quantization for Vision Transformers'. Together they form a unique fingerprint.

Cite this