TY - GEN
T1 - AGT
T2 - 2023 International Conference on Electronics, Information, and Communication, ICEIC 2023
AU - Kim, Nam Joon
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Channel pruning is a widely used approach that can efficiently reduce inference time and memory footprint by removing unnecessary channels in convolutional neural networks. In previous studies, channel pruning based on sparsity training was performed by imposing ℓ1 regularization on the scaling factor in batch normalization, and thereafter removing channels/filters below the predefined threshold. However, because channel pruning based on sparsity training imposes ℓ1 penalty on all scaling factors and uses the deformed gradient, an accuracy drop is inevitable. To address this problem, we propose a new sparsity training method referred to as adaptive gradient training (AGT). The proposed AGT can create a compact network without performance degradation using the original gradient to the extent possible without ℓ1 penalty usage. The proposed AGT can reduce the FLOPs of MobileNetV1 by 71.7% on the CIFAR-10 dataset while achieving an accuracy improvement of 0.04%. Consequently, the proposed method outperformed existing channel pruning methods for all datasets and models.
AB - Channel pruning is a widely used approach that can efficiently reduce inference time and memory footprint by removing unnecessary channels in convolutional neural networks. In previous studies, channel pruning based on sparsity training was performed by imposing ℓ1 regularization on the scaling factor in batch normalization, and thereafter removing channels/filters below the predefined threshold. However, because channel pruning based on sparsity training imposes ℓ1 penalty on all scaling factors and uses the deformed gradient, an accuracy drop is inevitable. To address this problem, we propose a new sparsity training method referred to as adaptive gradient training (AGT). The proposed AGT can create a compact network without performance degradation using the original gradient to the extent possible without ℓ1 penalty usage. The proposed AGT can reduce the FLOPs of MobileNetV1 by 71.7% on the CIFAR-10 dataset while achieving an accuracy improvement of 0.04%. Consequently, the proposed method outperformed existing channel pruning methods for all datasets and models.
KW - Adaptive Gradient Training
KW - Channel Pruning
KW - Convolutional Neural Network
KW - Pruning
UR - https://www.scopus.com/pages/publications/85150465448
U2 - 10.1109/ICEIC57457.2023.10049943
DO - 10.1109/ICEIC57457.2023.10049943
M3 - Conference contribution
AN - SCOPUS:85150465448
T3 - 2023 International Conference on Electronics, Information, and Communication, ICEIC 2023
BT - 2023 International Conference on Electronics, Information, and Communication, ICEIC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 February 2023 through 8 February 2023
ER -