TY - JOUR
T1 - Discrete Cosine Transformed Images Are Easy to Recognize in Vision Transformers
AU - Lee, Jongho
AU - Kim, Hyun
N1 - Publisher Copyright:
Copyrights © 2023 The Institute of Electronics and Information Engineers.
PY - 2023
Y1 - 2023
N2 - Deep learning models for image classification with adequate parameters show excellent classification performance because they can effectively extract the features of input images. On the other hand, there is a limit to the abilities of deep learning models to interpret images using only spatial information because an image is a signal with great spatial redundancy. Therefore, in this study, the discrete cosine transform was applied to an input image in units of an N×N block size to allow the deep learning model to employ both frequency and spatial information. The proposed method was implemented and verified by selecting a vision transformer using a 16×16 non-overlapping patch as a baseline and training various datasets of Cifar-10, Cifar-100, and Tiny-ImageNet from the very beginning without pre-trained weights. The experimental results showed that the top-1 accuracy is improved by approximately 3-5% for every dataset with little increase in computational cost.
AB - Deep learning models for image classification with adequate parameters show excellent classification performance because they can effectively extract the features of input images. On the other hand, there is a limit to the abilities of deep learning models to interpret images using only spatial information because an image is a signal with great spatial redundancy. Therefore, in this study, the discrete cosine transform was applied to an input image in units of an N×N block size to allow the deep learning model to employ both frequency and spatial information. The proposed method was implemented and verified by selecting a vision transformer using a 16×16 non-overlapping patch as a baseline and training various datasets of Cifar-10, Cifar-100, and Tiny-ImageNet from the very beginning without pre-trained weights. The experimental results showed that the top-1 accuracy is improved by approximately 3-5% for every dataset with little increase in computational cost.
KW - Computer vision
KW - Deep learning
KW - Discrete cosine transform (DCT)
KW - Image classification
KW - Vision transformer
UR - https://www.scopus.com/pages/publications/85154069937
U2 - 10.5573/IEIESPC.2023.12.1.48
DO - 10.5573/IEIESPC.2023.12.1.48
M3 - Article
AN - SCOPUS:85154069937
SN - 2287-5255
VL - 12
SP - 48
EP - 54
JO - IEIE Transactions on Smart Processing and Computing
JF - IEIE Transactions on Smart Processing and Computing
IS - 1
ER -