Systematic Homonym Detection and Replacement Based on Contextual Word Embedding

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Homonyms are words that share their spelling but differ in meaning and are a common feature in most languages. Homonyms are a source of noise i most text analyses and are difficult to detect; numerous studies have been conducted in this regard. However, extant methods typically detect homonyms using a rule-based or statistical-based approach, which requires an answer set, with little regard to the semantic meaning of the word. Therefore, we propose a novel approach for the detection of homonyms based on contextual word embedding that allows a word to be understood based on the context in which it appears. In this study, we extracted all contextual word embedding vectors of individual words and clustered those vectors using a spherical k-means clustering to detect pairs of homonyms. In addition, we developed a homonym replacement method to increase the performance of a document embedding technique, based on the word vector value. We replaced the embedding vectors of homonyms with a representative vector based on the respective meaning using the proposed homonym detection method. Experimental results indicate that the proposed method effectively detects homonyms and significantly improves the performance of document embedding.

Original languageEnglish
Pages (from-to)17-36
Number of pages20
JournalNeural Processing Letters
Volume53
Issue number1
DOIs
StatePublished - Feb 2021

Keywords

  • Contextual word embedding
  • ELMo
  • Homonym detection
  • Spherical k-means clustering
  • Word-clustering based document embedding

Fingerprint

Dive into the research topics of 'Systematic Homonym Detection and Replacement Based on Contextual Word Embedding'. Together they form a unique fingerprint.

Cite this