LAMP-Q: Layer Sensitivity-Aware Mixed-Precision Quantization for MobileNetV3

Seokkyu Yoon, Namjoon Kim, Hyun Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Quantization is an effective technique for reducing memory usage and power consumption in deep neural networks (DNNs) by decreasing parameter size. However, conventional quantization methods often lead to significant accuracy loss when applied to compact architectures such as MobileNet. In particular, quantizing MobileNetV3 causes accuracy degradation due to the presence of large outliers. To address this challenge, we propose a hardware-friendly mixed-precision quantization approach. Unlike existing methods, which suffer from low memory and computational efficiency due to the use of diverse bit-widths that do not align with memory address space sizes, our approach applies 8-bit quantization to activations and selectively quantizes weights to 4-, 8-, or 16-bit, depending on the sensitivity of each layer. This strategy not only enhances memory and computational efficiency but also minimizes accuracy degradation. When evaluated on the ImageNet-1k dataset, our proposed method reduces the parameter count of MobileNetV3-small and MobileNetV3-large by 78.31% and 75.61%, respectively, while achieving accuracy drops of only 0.90% and 0.81%.

Original languageEnglish
Title of host publication2025 International Conference on Electronics, Information, and Communication, ICEIC 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331510756
DOIs
StatePublished - 2025
Event2025 International Conference on Electronics, Information, and Communication, ICEIC 2025 - Osaka, Japan
Duration: 19 Jan 202522 Jan 2025

Publication series

Name2025 International Conference on Electronics, Information, and Communication, ICEIC 2025

Conference

Conference2025 International Conference on Electronics, Information, and Communication, ICEIC 2025
Country/TerritoryJapan
CityOsaka
Period19/01/2522/01/25

Keywords

  • Convolutional Neural Network
  • Mixed Precision Quantization
  • MobileNetV3
  • Sensitivity

Fingerprint

Dive into the research topics of 'LAMP-Q: Layer Sensitivity-Aware Mixed-Precision Quantization for MobileNetV3'. Together they form a unique fingerprint.

Cite this