Text augmentation method with adjustable manipulation intensity based on in-context learning

Yuho Cha, Younghoon Lee

Research output: Contribution to journalArticlepeer-review

Abstract

Text augmentation, a technique for generating new samples through various combinations, noise, and manipulations of small datasets, is an essential technique in natural language processing research. This methodology enables the construction of robust models during the training step by enhancing data diversity. However, determining the manipulation level remains a significant challenge. When the manipulation intensity is too low, insufficient data diversity is generated, leading to suboptimal augmentation effects. Conversely, excessive manipulation can compromise label reliability, resulting in a degradation of model performance. To address the challenge of “manipulation level,” we propose a text augmentation technique that can make systematic adjustments. In particular, we introduce a method for flexibly resetting the range of the candidate pool for manipulations, ensuring an optimal level of randomness during the augmentation process. We also introduce an advanced sentence embedding that supports reliable pseudo-labeling across different manipulation levels. Additionally, we utilize ChatGPT model in the final stage to enhance the coherence and expressiveness of the generated text, thereby improving the quality of the output. To evaluate the effectiveness of our approach, we performed comparisons with existing text augmentation approaches. The experimental results show significant performance improvements in almost all test datasets.

Original languageEnglish
Pages (from-to)5901-5923
Number of pages23
JournalKnowledge and Information Systems
Volume67
Issue number7
DOIs
StatePublished - Jul 2025

Keywords

  • Adjustable manipulation intensity
  • Advanced sentence embedding
  • In-context learning
  • Reliable pseudo-labels
  • Text augmentation

Fingerprint

Dive into the research topics of 'Text augmentation method with adjustable manipulation intensity based on in-context learning'. Together they form a unique fingerprint.

Cite this