TY - JOUR
T1 - Systematic Homonym Detection and Replacement Based on Contextual Word Embedding
AU - Lee, Younghoon
N1 - Publisher Copyright:
© 2020, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2021/2
Y1 - 2021/2
N2 - Homonyms are words that share their spelling but differ in meaning and are a common feature in most languages. Homonyms are a source of noise i most text analyses and are difficult to detect; numerous studies have been conducted in this regard. However, extant methods typically detect homonyms using a rule-based or statistical-based approach, which requires an answer set, with little regard to the semantic meaning of the word. Therefore, we propose a novel approach for the detection of homonyms based on contextual word embedding that allows a word to be understood based on the context in which it appears. In this study, we extracted all contextual word embedding vectors of individual words and clustered those vectors using a spherical k-means clustering to detect pairs of homonyms. In addition, we developed a homonym replacement method to increase the performance of a document embedding technique, based on the word vector value. We replaced the embedding vectors of homonyms with a representative vector based on the respective meaning using the proposed homonym detection method. Experimental results indicate that the proposed method effectively detects homonyms and significantly improves the performance of document embedding.
AB - Homonyms are words that share their spelling but differ in meaning and are a common feature in most languages. Homonyms are a source of noise i most text analyses and are difficult to detect; numerous studies have been conducted in this regard. However, extant methods typically detect homonyms using a rule-based or statistical-based approach, which requires an answer set, with little regard to the semantic meaning of the word. Therefore, we propose a novel approach for the detection of homonyms based on contextual word embedding that allows a word to be understood based on the context in which it appears. In this study, we extracted all contextual word embedding vectors of individual words and clustered those vectors using a spherical k-means clustering to detect pairs of homonyms. In addition, we developed a homonym replacement method to increase the performance of a document embedding technique, based on the word vector value. We replaced the embedding vectors of homonyms with a representative vector based on the respective meaning using the proposed homonym detection method. Experimental results indicate that the proposed method effectively detects homonyms and significantly improves the performance of document embedding.
KW - Contextual word embedding
KW - ELMo
KW - Homonym detection
KW - Spherical k-means clustering
KW - Word-clustering based document embedding
UR - https://www.scopus.com/pages/publications/85092921527
U2 - 10.1007/s11063-020-10376-8
DO - 10.1007/s11063-020-10376-8
M3 - Article
AN - SCOPUS:85092921527
SN - 1370-4621
VL - 53
SP - 17
EP - 36
JO - Neural Processing Letters
JF - Neural Processing Letters
IS - 1
ER -