Skip to main navigation Skip to search Skip to main content

LRA-QViT: Integrating Low-Rank Approximation and Quantization for Robust and Efficient Vision Transformers

  • Seoul National University of Science and Technology (SNUST)

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Recently, transformer-based models have demonstrated state-of-the-art performance across various computer vision tasks, including image classification, detection, and segmentation. However, their substantial parameter count poses significant challenges for deployment in resource-constrained environments such as edge or mobile devices. Low-rank approximation (LRA) has emerged as a promising model compression technique, effectively reducing the number of parameters in transformer models by decomposing high-dimensional weight matrices into low-rank representations. Nevertheless, matrix decomposition inherently introduces information loss, often leading to a decline in model accuracy. Furthermore, existing studies on LRA largely overlook the quantization process, which is a critical step in deploying practical vision transformer (ViT) models. To address these challenges, we propose a robust LRA framework that preserves weight information after matrix decomposition and incorporates quantization tailored to LRA characteristics. First, we introduce a reparameterizable branch-based low-rank approximation (RB-LRA) method coupled with weight reconstruction to minimize information loss during matrix decomposition. Subsequently, we enhance model accuracy by integrating RB-LRA with knowledge distillation techniques. Lastly, we present an LRA-aware quantization method designed to mitigate the large outliers generated by LRA, thereby improving the robustness of the quantized model.

Original languageEnglish
Pages (from-to)28943-28958
Number of pages16
JournalProceedings of Machine Learning Research
Volume267
StatePublished - 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025

Fingerprint

Dive into the research topics of 'LRA-QViT: Integrating Low-Rank Approximation and Quantization for Robust and Efficient Vision Transformers'. Together they form a unique fingerprint.

Cite this