TY - GEN
T1 - GASQ
T2 - 2025 International Conference on Electronics, Information, and Communication, ICEIC 2025
AU - Jeong, Sangbeom
AU - Koo, Kwanghyun
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - As the demand for personalized AI models continues to rise and the importance of privacy protection grows, there has been increasing interest in efficiently training convolutional neural networks (CNNs) on mobile and edge devices. Since backward propagation (BP) requires significantly more computational resources and memory usage than forward propagation, low-bit quantization presents greater potential for improving the efficiency of CNN training. However, the variability and specificity of gradient distribution during BP make gradient quantization particularly challenging. Existing studies attempt to mitigate this issue through additional computations, but they often lead to increased hardware complexity. To address this, we propose a hardware-efficient INT8 quantization method, gradient distribution-aware split quantization (GASQ), which is robust to gradient quantization errors. GASQ employs distinct scale factors for small and large magnitude gradients, effectively capturing the gradient distribution, which is predominantly centered around zero yet spans a broad range. This approach maintains low hardware complexity while achieving minimal quantization error. The proposed method demonstrates an average 0.27% performance improvement over full-precision models on the ImageNet dataset for classification tasks.
AB - As the demand for personalized AI models continues to rise and the importance of privacy protection grows, there has been increasing interest in efficiently training convolutional neural networks (CNNs) on mobile and edge devices. Since backward propagation (BP) requires significantly more computational resources and memory usage than forward propagation, low-bit quantization presents greater potential for improving the efficiency of CNN training. However, the variability and specificity of gradient distribution during BP make gradient quantization particularly challenging. Existing studies attempt to mitigate this issue through additional computations, but they often lead to increased hardware complexity. To address this, we propose a hardware-efficient INT8 quantization method, gradient distribution-aware split quantization (GASQ), which is robust to gradient quantization errors. GASQ employs distinct scale factors for small and large magnitude gradients, effectively capturing the gradient distribution, which is predominantly centered around zero yet spans a broad range. This approach maintains low hardware complexity while achieving minimal quantization error. The proposed method demonstrates an average 0.27% performance improvement over full-precision models on the ImageNet dataset for classification tasks.
KW - Convolutional neural networks
KW - gradient quantization
KW - low-bit training
KW - on-device AI
UR - http://www.scopus.com/inward/record.url?scp=86000019500&partnerID=8YFLogxK
U2 - 10.1109/ICEIC64972.2025.10879630
DO - 10.1109/ICEIC64972.2025.10879630
M3 - Conference contribution
AN - SCOPUS:86000019500
T3 - 2025 International Conference on Electronics, Information, and Communication, ICEIC 2025
BT - 2025 International Conference on Electronics, Information, and Communication, ICEIC 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 January 2025 through 22 January 2025
ER -