토픽모델링을 이용한 약어 중의성 해소

Translated title of the contribution: Abbreviation Disambiguation using Topic Modeling

Research output: Contribution to journalArticlepeer-review

Abstract

In recent, there are many research cases that analyze trends or research trends with text analysis. When collecting documents by searching for keywords in abbreviations for data analysis, it is necessary to disambiguate abbreviations. In many studies, documents are classified by hand-work reading the data one by one to find the data necessary for the study. Most of the studies to disambiguate abbreviations are studies that clarify the meaning of words and use supervised learning. The previous method to disambiguate abbreviation is not suitable for classification studies of documents looking for research data from abbreviation search documents, and related studies are also insufficient. This paper proposes a method of semi-automatically classifying documents collected by abbreviations by going topic modeling with Non-Negative Matrix Factorization, an unsupervised learning method, in the data pre-processing step. To verify the proposed method, papers were collected from academic DB with the abbreviation 'MSA'. The proposed method found 316 papers related to Micro Services Architecture in 1,401 papers. The document classification accuracy of the proposed method was measured at 92.36%. It is expected that the proposed method can reduce the researcher's time and cost due to hand work.
Translated title of the contributionAbbreviation Disambiguation using Topic Modeling
Original languageKorean
Pages (from-to)35-44
Number of pages10
Journal한국시뮬레이션학회 논문지
Volume32
Issue number1
DOIs
StatePublished - Mar 2023

Fingerprint

Dive into the research topics of 'Abbreviation Disambiguation using Topic Modeling'. Together they form a unique fingerprint.

Cite this