TY - JOUR
T1 - VFT
T2 - A versatile fine-tuning scheme based on feature distribution-aware knowledge distillation for lightweight convolutional neural networks
AU - Hong, Hyeonseok
AU - Kim, Hyun
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/11/1
Y1 - 2025/11/1
N2 - Various network compression techniques, such as pruning and quantization, are being actively researched in order to lighten convolutional neural networks (CNNs), which have increasingly deep and complex structures accompanied by the achievement of higher accuracy. Since most of these network compression techniques cause a decrease in accuracy, fine-tuning is essential to recover the performance of lightweight models; however, fine-tuning has received limited research attention compared to numerous compression techniques, and thus, performance recovery by fine-tuning has significant room for improvement. In this paper, we analyze the shortcomings of existing fine-tuning methods in terms of loss landscape and introduce a knowledge distillation (KD)-based fine-tuning approach that solves these problems. In particular, to overcome the limitation that KD can be adversely affected by the capacity difference between the teacher and student models or the defined knowledge to be transferred, we propose a feature distribution-aware knowledge distillation (FDKD) method, which defines appropriate supervision in the form of feature distribution to transfer the semantic information from teacher models. Moreover, we also propose a layer-wise FDKD method by exploiting the uniqueness of the lightweight model that the baseline (i.e., teacher) and compressed models (i.e., student) have the same architecture. Experiments on classification tasks demonstrate the superiority of the proposed method over existing fine-tuning methods, achieving up to 1.99% and 3.83% of accuracy improvement for pruned and quantized models, respectively. The source code for this implementation is available at [https://github.com/IDSL-SeoulTech/VFT].
AB - Various network compression techniques, such as pruning and quantization, are being actively researched in order to lighten convolutional neural networks (CNNs), which have increasingly deep and complex structures accompanied by the achievement of higher accuracy. Since most of these network compression techniques cause a decrease in accuracy, fine-tuning is essential to recover the performance of lightweight models; however, fine-tuning has received limited research attention compared to numerous compression techniques, and thus, performance recovery by fine-tuning has significant room for improvement. In this paper, we analyze the shortcomings of existing fine-tuning methods in terms of loss landscape and introduce a knowledge distillation (KD)-based fine-tuning approach that solves these problems. In particular, to overcome the limitation that KD can be adversely affected by the capacity difference between the teacher and student models or the defined knowledge to be transferred, we propose a feature distribution-aware knowledge distillation (FDKD) method, which defines appropriate supervision in the form of feature distribution to transfer the semantic information from teacher models. Moreover, we also propose a layer-wise FDKD method by exploiting the uniqueness of the lightweight model that the baseline (i.e., teacher) and compressed models (i.e., student) have the same architecture. Experiments on classification tasks demonstrate the superiority of the proposed method over existing fine-tuning methods, achieving up to 1.99% and 3.83% of accuracy improvement for pruned and quantized models, respectively. The source code for this implementation is available at [https://github.com/IDSL-SeoulTech/VFT].
KW - Convolutional neural network
KW - Fine-tuning
KW - Knowledge distillation
KW - Loss landscape
KW - Network compression
UR - https://www.scopus.com/pages/publications/105009704663
U2 - 10.1016/j.engappai.2025.111597
DO - 10.1016/j.engappai.2025.111597
M3 - Article
AN - SCOPUS:105009704663
SN - 0952-1976
VL - 159
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 111597
ER -