Discrete Cosine Transformed Images Are Easy to Recognize in Vision Transformers

Jongho Lee, Hyun Kim

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Deep learning models for image classification with adequate parameters show excellent classification performance because they can effectively extract the features of input images. On the other hand, there is a limit to the abilities of deep learning models to interpret images using only spatial information because an image is a signal with great spatial redundancy. Therefore, in this study, the discrete cosine transform was applied to an input image in units of an N×N block size to allow the deep learning model to employ both frequency and spatial information. The proposed method was implemented and verified by selecting a vision transformer using a 16×16 non-overlapping patch as a baseline and training various datasets of Cifar-10, Cifar-100, and Tiny-ImageNet from the very beginning without pre-trained weights. The experimental results showed that the top-1 accuracy is improved by approximately 3-5% for every dataset with little increase in computational cost.

Original languageEnglish
Pages (from-to)48-54
Number of pages7
JournalIEIE Transactions on Smart Processing and Computing
Volume12
Issue number1
DOIs
StatePublished - 2023

Keywords

  • Computer vision
  • Deep learning
  • Discrete cosine transform (DCT)
  • Image classification
  • Vision transformer

Fingerprint

Dive into the research topics of 'Discrete Cosine Transformed Images Are Easy to Recognize in Vision Transformers'. Together they form a unique fingerprint.

Cite this