Distributed Classification Model of Streaming Tweets based on Dynamic Model Update

Min Seon Kim, Hyuk Yoon Kwon

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

In this study, we propose a distributed architecture that dynamically updates the model for classifying tweet streams generated in real time. Our architecture ingests data streams through Apache Kafka and classifies them based on Apache Spark Streaming. In order to dynamically reflect input stream changes into the classification model, we design the classification model that can be dynamically updated by updating the tokenizer and classifier for new tweet streams. The proposed architecture can provide effective classification for data streams due to the dynamic update and can efficiently process through parallel processing of distributed environments. Through experiments using cyberattack-related tweets, we show that our classification model gradually improves the classification accuracy from 0.8869 when the initial 50,000 tweets are used to 0.9094 when 200,000 tweets are accumulated by F1-score.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022
EditorsHerwig Unger, Young-Kuk Kim, Eenjun Hwang, Sung-Bae Cho, Stephan Pareigis, Kyamakya Kyandoghere, Young-Guk Ha, Jinho Kim, Atsuyuki Morishima, Christian Wagner, Hyuk-Yoon Kwon, Yang-Sae Moon, Carson Leung
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages47-51
Number of pages5
ISBN (Electronic)9781665421973
DOIs
StatePublished - 2022
Event2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022 - Daegu, Korea, Republic of
Duration: 17 Jan 202220 Jan 2022

Publication series

NameProceedings - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022

Conference

Conference2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022
Country/TerritoryKorea, Republic of
CityDaegu
Period17/01/2220/01/22

Keywords

  • Data Ingestion
  • Distributed Processing
  • Dynamic Model Update
  • Event Classification
  • Streaming Tweets

Fingerprint

Dive into the research topics of 'Distributed Classification Model of Streaming Tweets based on Dynamic Model Update'. Together they form a unique fingerprint.

Cite this