TY - GEN
T1 - Systematical Randomness Assignment for the Level of Manipulation in Text Augmentation
AU - Cha, Youhoo
AU - Lee, Younghoon
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Text augmentation, a method for generating new texts by using noise, combinations, and other mixings to a scarce dataset, is an important skill in natural language processing (NLP). This allows for the introduction of diversity into the training process, resulting in more robust models. However, in the field of text augmentation, the level of manipulation can cause the following problems: When manipulation is 'low-level', it cannot guarantee diversity by generating data similar to the original but can lead to inefficient augmentation, while 'high-level' manipulation causes unreliable label issues and degrades model accuracy. Therefore, in this paper, a text augmentation technique is proposed by systematically assigning randomness to solve the 'level of manipulation' problem. Additionally, we generate an advanced sentence embedding that can assign robust pseudo-labels at a high manipulation level. That is, advanced sentence embeddings capable of assigning reliable pseudo-labels are generated by extracting information from the original data, namely sentence embeddings, document embeddings, and eX-plainable Artificial Intelligence(XAI) information. We verify the effectiveness of the proposed methodology through sentiment classification accuracy comparisons with existing text augmen-tation approaches, and show that the proposed methodology achieves high sentiment classification accuracy improvements on most experimental datasets.
AB - Text augmentation, a method for generating new texts by using noise, combinations, and other mixings to a scarce dataset, is an important skill in natural language processing (NLP). This allows for the introduction of diversity into the training process, resulting in more robust models. However, in the field of text augmentation, the level of manipulation can cause the following problems: When manipulation is 'low-level', it cannot guarantee diversity by generating data similar to the original but can lead to inefficient augmentation, while 'high-level' manipulation causes unreliable label issues and degrades model accuracy. Therefore, in this paper, a text augmentation technique is proposed by systematically assigning randomness to solve the 'level of manipulation' problem. Additionally, we generate an advanced sentence embedding that can assign robust pseudo-labels at a high manipulation level. That is, advanced sentence embeddings capable of assigning reliable pseudo-labels are generated by extracting information from the original data, namely sentence embeddings, document embeddings, and eX-plainable Artificial Intelligence(XAI) information. We verify the effectiveness of the proposed methodology through sentiment classification accuracy comparisons with existing text augmen-tation approaches, and show that the proposed methodology achieves high sentiment classification accuracy improvements on most experimental datasets.
KW - advanced sentence embedding
KW - reliable pseudo-labels
KW - text augmentation
KW - the level of manipulation
UR - https://www.scopus.com/pages/publications/105000839540
U2 - 10.1109/ICMLA61862.2024.00252
DO - 10.1109/ICMLA61862.2024.00252
M3 - Conference contribution
AN - SCOPUS:105000839540
T3 - Proceedings - 2024 International Conference on Machine Learning and Applications, ICMLA 2024
SP - 1633
EP - 1638
BT - Proceedings - 2024 International Conference on Machine Learning and Applications, ICMLA 2024
A2 - Wani, M. Arif
A2 - Angelov, Plamen
A2 - Luo, Feng
A2 - Ogihara, Mitsunori
A2 - Wu, Xintao
A2 - Precup, Radu-Emil
A2 - Ramezani, Ramin
A2 - Gu, Xiaowei
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE International Conference on Machine Learning and Applications, ICMLA 2024
Y2 - 18 December 2024 through 20 December 2024
ER -