TY - GEN
T1 - ACC
T2 - 6th IEEE International Conference on AI Circuits and Systems, AICAS 2024
AU - Lee, Gilha
AU - Lee, Seungil
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Network compression methods have been studied in various forms, such as pruning and quantization, to enable the deployment of large-scale convolutional neural networks (CNNs) in resource-constrained environments. However, several challenges remain in utilizing these compressed CNNs on resource-constrained platforms (e.g., on-device environments). In particular, previous studies on network compression have mostly focused on inference, and pruning/quantization techniques have been performed separately. To address these issues, this paper proposes a new compression technique, called an adaptive compression framework for CNN training (ACC), which combines the advantages of conventional compression techniques. The ACC is an adaptive solution that addresses the memory bottleneck by reducing the resolution of activation/gradient in the beginning layers of the CNN with weights/activations/gradients all quantized to 8 bits and pruning a large number of CNN filters in the subsequent layers. In addition, the large kernel convolution compression (LKCC) included in the ACC helps minimize the amount of information loss and effectively reduces memory and computation by applying a 2×2 average pooling filter to the activation/gradient of the beginning layers. Fine-tuning experiments using the ResNet18 model on the CIFAR-100 dataset showed that the proposed ACC framework can achieve efficient CNN training on mobile/edge devices by reducing memory consumption and FLOPs by 85% and 37%, respectively, with only a negligible performance degradation of 0.17% compared with the baseline.
AB - Network compression methods have been studied in various forms, such as pruning and quantization, to enable the deployment of large-scale convolutional neural networks (CNNs) in resource-constrained environments. However, several challenges remain in utilizing these compressed CNNs on resource-constrained platforms (e.g., on-device environments). In particular, previous studies on network compression have mostly focused on inference, and pruning/quantization techniques have been performed separately. To address these issues, this paper proposes a new compression technique, called an adaptive compression framework for CNN training (ACC), which combines the advantages of conventional compression techniques. The ACC is an adaptive solution that addresses the memory bottleneck by reducing the resolution of activation/gradient in the beginning layers of the CNN with weights/activations/gradients all quantized to 8 bits and pruning a large number of CNN filters in the subsequent layers. In addition, the large kernel convolution compression (LKCC) included in the ACC helps minimize the amount of information loss and effectively reduces memory and computation by applying a 2×2 average pooling filter to the activation/gradient of the beginning layers. Fine-tuning experiments using the ResNet18 model on the CIFAR-100 dataset showed that the proposed ACC framework can achieve efficient CNN training on mobile/edge devices by reducing memory consumption and FLOPs by 85% and 37%, respectively, with only a negligible performance degradation of 0.17% compared with the baseline.
KW - CNN Training
KW - Network Compression
KW - On-device Training
KW - Pruning
KW - Quantization
UR - http://www.scopus.com/inward/record.url?scp=85199908133&partnerID=8YFLogxK
U2 - 10.1109/AICAS59952.2024.10595863
DO - 10.1109/AICAS59952.2024.10595863
M3 - Conference contribution
AN - SCOPUS:85199908133
T3 - 2024 IEEE 6th International Conference on AI Circuits and Systems, AICAS 2024 - Proceedings
SP - 472
EP - 476
BT - 2024 IEEE 6th International Conference on AI Circuits and Systems, AICAS 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 April 2024 through 25 April 2024
ER -