TY - JOUR
T1 - Improved Image Classification With Token Fusion
AU - Choi, Keong Hun
AU - Kim, Jin Woo
AU - Wang, Yao
AU - Ha, Jong Eun
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2023
Y1 - 2023
N2 - In this paper, we propose a method to improve image classification performance using the fusion of CNN and transformer structure. In the case of CNN, information about a local area on an image can be extracted well, but global information extraction is limited. On the other hand, the transformer has an advantage in global information extraction, but it requires much memory compared to CNN. We apply CNN on an image and consider the feature vector of each pixel on the resulting feature map by CNN as a token. At the same time, the image is divided into patches, and each patch is considered a token, like a transformer. Tokens by CNN and transformer have advantages in extracting local and global information, respectively. We assume that the combination of these two types of tokens will have an improved characteristic, and we show it through experiments. We propose three methods to fuse tokens having different characteristics: (1) late token fusion with parallel structure, (2) early token fusion (3) token fusion in layer-by-layer. The proposed method shows the best classification performance in experiments using ImageNet-1K.
AB - In this paper, we propose a method to improve image classification performance using the fusion of CNN and transformer structure. In the case of CNN, information about a local area on an image can be extracted well, but global information extraction is limited. On the other hand, the transformer has an advantage in global information extraction, but it requires much memory compared to CNN. We apply CNN on an image and consider the feature vector of each pixel on the resulting feature map by CNN as a token. At the same time, the image is divided into patches, and each patch is considered a token, like a transformer. Tokens by CNN and transformer have advantages in extracting local and global information, respectively. We assume that the combination of these two types of tokens will have an improved characteristic, and we show it through experiments. We propose three methods to fuse tokens having different characteristics: (1) late token fusion with parallel structure, (2) early token fusion (3) token fusion in layer-by-layer. The proposed method shows the best classification performance in experiments using ImageNet-1K.
KW - convolutional neural networks
KW - deep learning
KW - Image classification
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85164374174&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2023.3291597
DO - 10.1109/ACCESS.2023.3291597
M3 - Article
AN - SCOPUS:85164374174
SN - 2169-3536
VL - 11
SP - 67460
EP - 67467
JO - IEEE Access
JF - IEEE Access
ER -