Regulating the level of manipulation in text augmentation with systematic adjustment and advanced sentence embedding

Yuho Cha, Younghoon Lee

Research output: Contribution to journalArticlepeer-review

Abstract

Text augmentation, a method for generating samples by applying combinations, noise, and other manipulations to a small dataset, is a crucial technique in natural language processing research. It introduced diversity into the training process, thereby enabling the construction of robust models. The level of manipulation is the most important issue in text augmentation; low-level manipulation generates data similar to the original, resulting in inefficient augmentation because it cannot ensure diversity, whereas high-level manipulation causes reliability issues for labels and degrades the model’s performance. Therefore, this paper proposes a systematically adjustable text augmentation technique to address the “level of manipulation” issue. Specifically, it proposes a method for systematically adjusting the data candidate pool for manipulation to provide an appropriate level of randomness during the augmentation process. Furthermore, we propose an advanced sentence embedding methodology to achieve robust pseudo-labeling at the manipulation level. In other words, we leverage combined sentence embedding, which incorporates sentence embedding, document embedding, and XAI information from the original data to assign reliable pseudo-labels. We conducted performance comparisons with existing text augmentation approaches to validate the effectiveness of our proposed methodology. The experimental results demonstrate that the proposed method achieves the highest performance improvement across all the experimental datasets.

Original languageEnglish
Article number107732
Pages (from-to)3473-3487
Number of pages15
JournalNeural Computing and Applications
Volume37
Issue number5
DOIs
StatePublished - Feb 2025

Keywords

  • Advanced sentence embedding
  • Reliable pseudo-labels
  • Text augmentation
  • The level of manipulation

Fingerprint

Dive into the research topics of 'Regulating the level of manipulation in text augmentation with systematic adjustment and advanced sentence embedding'. Together they form a unique fingerprint.

Cite this