Abstract
After Transformer and ViT, the use of attention in vision is increasing, and the performance has also surpassed the existing convolution layer. Attention uses a linear layer by default. Therefore, attention can be seen as an improvement of the linear layer. The direction of improvement of linear and convolution layers is similar when looking at the contents studied so far. The biggest difference between the two is the size of the receptive field. In this paper, various types of FlipConv blocks are presented. In particular, FlipConv block type4 shows similar performance to ConvNeXt block with a small parameter increase. And it shows that the performance of about 8% is good for the data in the form not used for learning.
Original language | English |
---|---|
Pages (from-to) | 903-909 |
Number of pages | 7 |
Journal | Journal of Institute of Control, Robotics and Systems |
Volume | 28 |
Issue number | 10 |
DOIs | |
State | Published - 2022 |
Keywords
- Backbone
- Classification
- Deep learning
- Feature map
- Flip