Systematical Randomness Assignment for the Level of Manipulation in Text Augmentation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text augmentation, a method for generating new texts by using noise, combinations, and other mixings to a scarce dataset, is an important skill in natural language processing (NLP). This allows for the introduction of diversity into the training process, resulting in more robust models. However, in the field of text augmentation, the level of manipulation can cause the following problems: When manipulation is 'low-level', it cannot guarantee diversity by generating data similar to the original but can lead to inefficient augmentation, while 'high-level' manipulation causes unreliable label issues and degrades model accuracy. Therefore, in this paper, a text augmentation technique is proposed by systematically assigning randomness to solve the 'level of manipulation' problem. Additionally, we generate an advanced sentence embedding that can assign robust pseudo-labels at a high manipulation level. That is, advanced sentence embeddings capable of assigning reliable pseudo-labels are generated by extracting information from the original data, namely sentence embeddings, document embeddings, and eX-plainable Artificial Intelligence(XAI) information. We verify the effectiveness of the proposed methodology through sentiment classification accuracy comparisons with existing text augmen-tation approaches, and show that the proposed methodology achieves high sentiment classification accuracy improvements on most experimental datasets.

Original languageEnglish
Title of host publicationProceedings - 2024 International Conference on Machine Learning and Applications, ICMLA 2024
EditorsM. Arif Wani, Plamen Angelov, Feng Luo, Mitsunori Ogihara, Xintao Wu, Radu-Emil Precup, Ramin Ramezani, Xiaowei Gu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1633-1638
Number of pages6
ISBN (Electronic)9798350374889
DOIs
StatePublished - 2024
Event23rd IEEE International Conference on Machine Learning and Applications, ICMLA 2024 - Miami, United States
Duration: 18 Dec 202420 Dec 2024

Publication series

NameProceedings - 2024 International Conference on Machine Learning and Applications, ICMLA 2024

Conference

Conference23rd IEEE International Conference on Machine Learning and Applications, ICMLA 2024
Country/TerritoryUnited States
CityMiami
Period18/12/2420/12/24

Keywords

  • advanced sentence embedding
  • reliable pseudo-labels
  • text augmentation
  • the level of manipulation

Fingerprint

Dive into the research topics of 'Systematical Randomness Assignment for the Level of Manipulation in Text Augmentation'. Together they form a unique fingerprint.

Cite this