TY - GEN
T1 - Activation Distribution-based Layer-wise Quantization for Convolutional Neural Networks
AU - Ki, Subin
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - As convolutional neural network (CNN) research accelerates with advances in GPU s, the accuracy of CNN models has been continuously improved. However, in proportion to the enhancement of model accuracy, the computational amount of the CNN models increases, which causes a problem that it is difficult to practically use the CNN models in mobile/embedded platforms. To address this problem, optimization and weight reduction methods for CNN models have been actively studied. This paper proposes a new scale factor for layer-specific quantization considering the activation distribution of CNN s. The proposed method has the advantage that it is possible to minimize the accuracy drop for each layer and is friendly to hardware accelerator design. As a result, the proposed quantization method achieves much higher accuracy compared to the quantization studies on the conventional accelerator design while maintaining low hardware resources.
AB - As convolutional neural network (CNN) research accelerates with advances in GPU s, the accuracy of CNN models has been continuously improved. However, in proportion to the enhancement of model accuracy, the computational amount of the CNN models increases, which causes a problem that it is difficult to practically use the CNN models in mobile/embedded platforms. To address this problem, optimization and weight reduction methods for CNN models have been actively studied. This paper proposes a new scale factor for layer-specific quantization considering the activation distribution of CNN s. The proposed method has the advantage that it is possible to minimize the accuracy drop for each layer and is friendly to hardware accelerator design. As a result, the proposed quantization method achieves much higher accuracy compared to the quantization studies on the conventional accelerator design while maintaining low hardware resources.
KW - Activation quantization
KW - convolutional neural networks
KW - layer-wise
KW - scale factor
UR - https://www.scopus.com/pages/publications/85128851543
U2 - 10.1109/ICEIC54506.2022.9748745
DO - 10.1109/ICEIC54506.2022.9748745
M3 - Conference contribution
AN - SCOPUS:85128851543
T3 - 2022 International Conference on Electronics, Information, and Communication, ICEIC 2022
BT - 2022 International Conference on Electronics, Information, and Communication, ICEIC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Conference on Electronics, Information, and Communication, ICEIC 2022
Y2 - 6 February 2022 through 9 February 2022
ER -