Abstract
The classification of customer-voice data is an important matter in real business since it is necessary for customer-voice data to be delivered to relevant departments and responsible individuals. Additionally, customer-voice data typically includes several novel words, such as typo's, informal terms, or exceedingly general words to discriminate between categories of customer-voice data. Furthermore, noisy data often has a negative effect on the classification task. In this study, advanced novelty detection method is proposed to utilize class vector that possessed high cosine similarity with words to effectively discriminate between classes. The class vector is considered as the centroid or the mean of each word vector distribution as derived from the neural embedding model, and the novelty score of each word is calculated and novel words are effectively detected. Each novelty score is calculated by improvements of GMM and KMC methods utilizing a class vector. The experiments verify the propriety of the proposed method with qualitative observations, and the application of the proposed method with quantitative experiments verifies the representational effectiveness and classification performance of customer-voice data. The experiment results indicate that the performance of a classification of customer-voice data improved with the application of the newly proposed novelty detection method in this study.
Original language | English |
---|---|
Pages (from-to) | 717-733 |
Number of pages | 17 |
Journal | Intelligent Data Analysis |
Volume | 22 |
Issue number | 4 |
DOIs | |
State | Published - 2018 |
Keywords
- class vector
- customer-voice
- De-noising documents
- novelty detection