Abstract
Deep learning models for image classification with adequate parameters show excellent classification performance because they can effectively extract the features of input images. On the other hand, there is a limit to the abilities of deep learning models to interpret images using only spatial information because an image is a signal with great spatial redundancy. Therefore, in this study, the discrete cosine transform was applied to an input image in units of an N×N block size to allow the deep learning model to employ both frequency and spatial information. The proposed method was implemented and verified by selecting a vision transformer using a 16×16 non-overlapping patch as a baseline and training various datasets of Cifar-10, Cifar-100, and Tiny-ImageNet from the very beginning without pre-trained weights. The experimental results showed that the top-1 accuracy is improved by approximately 3-5% for every dataset with little increase in computational cost.
| Original language | English |
|---|---|
| Pages (from-to) | 48-54 |
| Number of pages | 7 |
| Journal | IEIE Transactions on Smart Processing and Computing |
| Volume | 12 |
| Issue number | 1 |
| DOIs | |
| State | Published - 2023 |
Keywords
- Computer vision
- Deep learning
- Discrete cosine transform (DCT)
- Image classification
- Vision transformer
Fingerprint
Dive into the research topics of 'Discrete Cosine Transformed Images Are Easy to Recognize in Vision Transformers'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver