TY - GEN
T1 - Scene Text Recognition with Multi-decoders
AU - Wang, Yao
AU - Ha, Jong Eun
N1 - Publisher Copyright:
© 2021 ICROS.
PY - 2021
Y1 - 2021
N2 - In this article, we focus on the scene text recognition problem, which is one of the challenging sub-files of computer vision because of the random existence of scene text. Recently, scene text recognition has achieved state-of-art performance because of the improvement of deep learning. At present, encoder-decoder architecture was widely used for scene recognition tasks, which consist of feature extractor, sequence module. Specifically, at the decoder part, connectionist temporal classification(CTC), attention mechanism, and transformer(self-attention) are three main approaches used in recent research. CTC decoder is flexible and can handle sequences with large changes in length for its align sequences features with labels in a frame-wise manner. Attention decoder can learn better and deeper feature expression and get the better position information of each character. Attention decoder can get more robust and accurate performance for both regular and irregular scene text. Moreover, a novel decoder mechanism is introduced in our study. The proposed architecture has several advantages: the model can be trained using the end-to-end manner under the condition of multi decoders, and can deal with the sequences of arbitrary length and the images of arbitrary shape. Extensive experiments on standard benchmarks demonstrate that our model's performance is improved for regular and irregular text recognition.
AB - In this article, we focus on the scene text recognition problem, which is one of the challenging sub-files of computer vision because of the random existence of scene text. Recently, scene text recognition has achieved state-of-art performance because of the improvement of deep learning. At present, encoder-decoder architecture was widely used for scene recognition tasks, which consist of feature extractor, sequence module. Specifically, at the decoder part, connectionist temporal classification(CTC), attention mechanism, and transformer(self-attention) are three main approaches used in recent research. CTC decoder is flexible and can handle sequences with large changes in length for its align sequences features with labels in a frame-wise manner. Attention decoder can learn better and deeper feature expression and get the better position information of each character. Attention decoder can get more robust and accurate performance for both regular and irregular scene text. Moreover, a novel decoder mechanism is introduced in our study. The proposed architecture has several advantages: the model can be trained using the end-to-end manner under the condition of multi decoders, and can deal with the sequences of arbitrary length and the images of arbitrary shape. Extensive experiments on standard benchmarks demonstrate that our model's performance is improved for regular and irregular text recognition.
KW - Attention decoder module
KW - CTC decoder module
KW - End to end frame
KW - Scene text recognition
UR - https://www.scopus.com/pages/publications/85124250278
U2 - 10.23919/ICCAS52745.2021.9649998
DO - 10.23919/ICCAS52745.2021.9649998
M3 - Conference contribution
AN - SCOPUS:85124250278
T3 - International Conference on Control, Automation and Systems
SP - 1523
EP - 1528
BT - 2021 21st International Conference on Control, Automation and Systems, ICCAS 2021
PB - IEEE Computer Society
T2 - 21st International Conference on Control, Automation and Systems, ICCAS 2021
Y2 - 12 October 2021 through 15 October 2021
ER -