DC-MPQ: Distributional Clipping-based Mixed-Precision Quantization for Convolutional Neural Networks

Seungjin Lee, Hyun Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Quantization is a representative network compression technique that reduces the number of computational operations and memory accesses in the computation process of convolutional neural networks (CNNs). The existing naïve quantization method has a problem in that the quantization point corresponding to the near-zero value decreases as the precision decreases; as a result, the quantization error increases. Recent quantization-related studies have suggested various solutions to this problem. Nevertheless, studies that suggest a method to solve this problem by considering the characteristics of hardware accelerator implementation have not been actively conducted. To address this problem, this study proposes a method of using standard deviation values, which are simple statistical values of distribution for each layer, as clipping points and setting a scale factor with a clipping point as the base to quantize the weights into a mixed-precision integer format of 4-bit/8-bit. The proposed technique can be applied to any network without additional training, and only biasing and mapping are performed based on the pre-stored standard deviation values; thus, the computational complexity is low, rendering it hardware-friendly. Experimental results indicate that the proposed mixed-precision quantization of the weights of ResNet-18 on ImageNet achieved an effect of reducing the weight capacity by 84% with a 0.34% Top-1 accuracy drop compared to full precision. In YOLACT, an instance segmentation model using a ResNet-50 backbone, on MS COCO, a weight capacity reduction of 81.7% was achieved with only 0.27% and 0.19% drops in box mean average precision (mAP) and mask mAP, respectively.

Original languageEnglish
Title of host publicationProceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages130-133
Number of pages4
ISBN (Electronic)9781665409964
DOIs
StatePublished - 2022
Event4th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022 - Incheon, Korea, Republic of
Duration: 13 Jun 202215 Jun 2022

Publication series

NameProceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022

Conference

Conference4th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
Country/TerritoryKorea, Republic of
CityIncheon
Period13/06/2215/06/22

Keywords

  • convolutional neural network
  • distributional clipping
  • hardware-aware quantization
  • mixed-precision quantization
  • segmentation

Fingerprint

Dive into the research topics of 'DC-MPQ: Distributional Clipping-based Mixed-Precision Quantization for Convolutional Neural Networks'. Together they form a unique fingerprint.

Cite this