Scene Text Recognition with Multi-Encoders

Yao Wang, Jong Eun Ha

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Although text recognition has significantly evolved over the years, the current models still have huge challenges, especially for irregular text images, such as complex backgrounds, curved text, diverse fonts, distortions, etc. Currently, CNN-based text recognition networks have shown good performance but still face the above challenges. Recently, feature extractor based on transformer has shown excellent advantages for global feature extraction on images. Especially in irregular text images, which can use self-attention to establish the information connection of each part of the image, which can also reduce the influence of the irregular distribution of characters. Therefore, this paper proposes MESTR(Multi-Encoders Scene Text Recognition) that combines a CNN-based [1] [2] [6] feature extractor and a transformer-based feature extractor. MESTR can extract local and global features of text images at the same time and then integrate global features into local features. During training, we used CTC [6] as guide training in the decoder part, as the compensation training strategy for attentional decoder. Experimental results demonstrate that the proposed MESTR shows competitive results on all seven benchmarks. At the same time, we provide ablation experiments to show the effectiveness of the improved part on the text recognition model.

Original languageEnglish
Title of host publication2022 22nd International Conference on Control, Automation and Systems, ICCAS 2022
PublisherIEEE Computer Society
Pages1615-1620
Number of pages6
ISBN (Electronic)9788993215243
DOIs
StatePublished - 2022
Event22nd International Conference on Control, Automation and Systems, ICCAS 2022 - Busan, Korea, Republic of
Duration: 27 Nov 20221 Dec 2022

Publication series

NameInternational Conference on Control, Automation and Systems
Volume2022-November
ISSN (Print)1598-7833

Conference

Conference22nd International Conference on Control, Automation and Systems, ICCAS 2022
Country/TerritoryKorea, Republic of
CityBusan
Period27/11/221/12/22

Keywords

  • Convolutional neural network
  • Deep learning
  • Scene text recognition
  • Transformer

Fingerprint

Dive into the research topics of 'Scene Text Recognition with Multi-Encoders'. Together they form a unique fingerprint.

Cite this