Document representation based on probabilistic word clustering in customer-voice classification

Younghoon Lee, Seokmin Song, Sungzoon Cho, Jinhae Choi

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Customer-voice data have an important role in different fields including marketing, product planning, and quality assurance. However, owing to the manual processes involved, there are problems associated with the classification of customer-voice data. This study focuses on building automatic classifiers for customer-voice data with newly proposed document representation methods based on neural-embedding and probabilistic word-clustering approaches. Semantically similar terms are classified into a common cluster. The words generated from neural embedding are clustered according to the membership strength of each word relative to each cluster derived from a probabilistic clustering method such as the fuzzy C-means clustering method or Gaussian mixture model. It is expected that the proposed method can be suitable for the classification of customer-voice data consisting of unstructured text by considering the membership strength. The results demonstrate that the proposed method achieved an accuracy of 89.24% with respect to representational effectiveness and an accuracy of 87.76% with respect to the classification performance of customer-voice data consisting of 12 classes. Further, the method provided an intuitive interpretation for the generated representation.

Original languageEnglish
Pages (from-to)221-232
Number of pages12
JournalPattern Analysis and Applications
Volume22
Issue number1
DOIs
StatePublished - 5 Feb 2019

Keywords

  • Classification
  • Customer-voice
  • Document representation
  • Probabilistic word clustering

Fingerprint

Dive into the research topics of 'Document representation based on probabilistic word clustering in customer-voice classification'. Together they form a unique fingerprint.

Cite this