TY - GEN
T1 - Comparison of Out-of-Distribution Detection Performance of CLIP-based Fine-Tuning Methods
AU - Kim, Jeonghyeon
AU - Kim, Jihyo
AU - Hwang, Sangheum
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In recent years, large-scale vision-language models such as CLIP have shown remarkable performance on various zero-shot classification tasks. Inspired by these pretrained models, many studies have proposed effective fine-tuning methods to exploit the models' pre-trained knowledge. One common approach is to fine-tune the entire model to transfer the pre-trained knowledge to target tasks. On the other hand, given the high cost of fine-tuning large-scale models, parameter-efficient fine-tuning methods are also being explored. While there have been performance comparisons of existing fine-tuning methods, they often focus solely on classification performance. Such a singular focus is not sufficient to assess the quality of the transferred pre-trained knowledge comprehensively. For a more rigorous evaluation, other metrics, such as the detection of out-of-distribution samples, should be considered, given their importance for model reliability. However, the comparison of fine-tuning methods concerning model reliability has been less explored. Therefore, we aim to fill this gap by offering a comprehensive comparative analysis on the out-of-distribution detection performance of CLIP-based fine-tuning methods along with their in-distribution classification performance. Our experimental results on the OpenOOD v1.5 benchmark dataset suggest that fine-tuning the entire model provides superior performance in both classification and out-of-distribution detection in a few-shot setting.
AB - In recent years, large-scale vision-language models such as CLIP have shown remarkable performance on various zero-shot classification tasks. Inspired by these pretrained models, many studies have proposed effective fine-tuning methods to exploit the models' pre-trained knowledge. One common approach is to fine-tune the entire model to transfer the pre-trained knowledge to target tasks. On the other hand, given the high cost of fine-tuning large-scale models, parameter-efficient fine-tuning methods are also being explored. While there have been performance comparisons of existing fine-tuning methods, they often focus solely on classification performance. Such a singular focus is not sufficient to assess the quality of the transferred pre-trained knowledge comprehensively. For a more rigorous evaluation, other metrics, such as the detection of out-of-distribution samples, should be considered, given their importance for model reliability. However, the comparison of fine-tuning methods concerning model reliability has been less explored. Therefore, we aim to fill this gap by offering a comprehensive comparative analysis on the out-of-distribution detection performance of CLIP-based fine-tuning methods along with their in-distribution classification performance. Our experimental results on the OpenOOD v1.5 benchmark dataset suggest that fine-tuning the entire model provides superior performance in both classification and out-of-distribution detection in a few-shot setting.
KW - CLIP
KW - Fine-tuning
KW - Multi-modal foundation models
KW - Out-of-distribution detection
UR - https://www.scopus.com/pages/publications/85189239211
U2 - 10.1109/ICEIC61013.2024.10457104
DO - 10.1109/ICEIC61013.2024.10457104
M3 - Conference contribution
AN - SCOPUS:85189239211
T3 - 2024 International Conference on Electronics, Information, and Communication, ICEIC 2024
BT - 2024 International Conference on Electronics, Information, and Communication, ICEIC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 International Conference on Electronics, Information, and Communication, ICEIC 2024
Y2 - 28 January 2024 through 31 January 2024
ER -