TY - JOUR
T1 - Explainability-Based Mix-Up Approach for Text Data Augmentation
AU - Kwon, Soonki
AU - Lee, Younghoon
N1 - Publisher Copyright:
© 2023 Association for Computing Machinery.
PY - 2023/2/20
Y1 - 2023/2/20
N2 - Text augmentation is a strategy for increasing the diversity of training examples without explicitly collecting new data. Owing to the efficiency and effectiveness of text augmentation, numerous augmentation methodologies have been proposed. Among them, the method based on modification, particularly the mix-up method of swapping words between two or more sentences, is widely used because it can be applied simply and shows good levels of performance. However, the existing mix-up approaches are limited; they do not reflect the importance of the manipulated word. That is, even if a word that has a critical effect on the classification result is manipulated, it is not considered significant in labeling the augmented data. Therefore, in this study, we propose an effective text augmentation technique that explicitly derives the importance of manipulated words and reflects this importance in the labeling of augmented data. The importance of each word, in other words, explainability, is calculated, and this is explicitly reflected in the labeling process of the augmented data. The results of the experiment confirmed that when the importance of the manipulated word was reflected in the labeling, the performance was significantly higher than that of the existing methods.
AB - Text augmentation is a strategy for increasing the diversity of training examples without explicitly collecting new data. Owing to the efficiency and effectiveness of text augmentation, numerous augmentation methodologies have been proposed. Among them, the method based on modification, particularly the mix-up method of swapping words between two or more sentences, is widely used because it can be applied simply and shows good levels of performance. However, the existing mix-up approaches are limited; they do not reflect the importance of the manipulated word. That is, even if a word that has a critical effect on the classification result is manipulated, it is not considered significant in labeling the augmented data. Therefore, in this study, we propose an effective text augmentation technique that explicitly derives the importance of manipulated words and reflects this importance in the labeling of augmented data. The importance of each word, in other words, explainability, is calculated, and this is explicitly reflected in the labeling process of the augmented data. The results of the experiment confirmed that when the importance of the manipulated word was reflected in the labeling, the performance was significantly higher than that of the existing methods.
KW - Text augmentation
KW - XAI
KW - mix-up approach
KW - soft-labeling
KW - word-explainability
UR - http://www.scopus.com/inward/record.url?scp=85150166742&partnerID=8YFLogxK
U2 - 10.1145/3533048
DO - 10.1145/3533048
M3 - Article
AN - SCOPUS:85150166742
SN - 1556-4681
VL - 17
JO - ACM Transactions on Knowledge Discovery from Data
JF - ACM Transactions on Knowledge Discovery from Data
IS - 1
M1 - 13
ER -