Exploring the effectiveness of data-centric AI approaches to developing a prescription recognition system

Research output: Contribution to journalArticlepeer-review

Abstract

Optical character recognition (OCR) has been in high demand across a wide range of fields and has been rapidly evolving since deep learning was introduced. The mainstream of OCR research focuses on model-centric approaches that improve performance by designing novel model architectures or learning algorithms. However, for industrial practitioners, such model-based approaches are not particularly useful for constructing application-specific OCR systems. In this study, we investigate the effectiveness of a data-centric approach to developing a Korean prescription recognition system. The proposed data-centric approach utilizes domain-specific synthetic data that reflect the visual properties and contextual priors of the target domain, allowing a model to learn domain-specific knowledge from training data. Specifically, the proposed data-centric approach constructs a domain-specific word dictionary for domain priors, and generates training synthetic images containing the visual properties of the target domain. For text recognition in prescription documents, where specialized knowledge is required, we demonstrate that the proposed data-centric approach is much more effective than model-centric approaches. Training using domain-specific synthetic data generated by the proposed data-centric approach facilitates precise predictions for texts requiring domain-specific knowledge.

Original languageEnglish
Article number103544
JournalInternational Journal on Document Analysis and Recognition
DOIs
StateAccepted/In press - 2025

Keywords

  • Data-centric AI
  • Optical character recognition
  • Scene text recognition
  • Synthetic data generation

Fingerprint

Dive into the research topics of 'Exploring the effectiveness of data-centric AI approaches to developing a prescription recognition system'. Together they form a unique fingerprint.

Cite this