TY - GEN
T1 - Distributed Classification Model of Streaming Tweets based on Dynamic Model Update
AU - Kim, Min Seon
AU - Kwon, Hyuk Yoon
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - In this study, we propose a distributed architecture that dynamically updates the model for classifying tweet streams generated in real time. Our architecture ingests data streams through Apache Kafka and classifies them based on Apache Spark Streaming. In order to dynamically reflect input stream changes into the classification model, we design the classification model that can be dynamically updated by updating the tokenizer and classifier for new tweet streams. The proposed architecture can provide effective classification for data streams due to the dynamic update and can efficiently process through parallel processing of distributed environments. Through experiments using cyberattack-related tweets, we show that our classification model gradually improves the classification accuracy from 0.8869 when the initial 50,000 tweets are used to 0.9094 when 200,000 tweets are accumulated by F1-score.
AB - In this study, we propose a distributed architecture that dynamically updates the model for classifying tweet streams generated in real time. Our architecture ingests data streams through Apache Kafka and classifies them based on Apache Spark Streaming. In order to dynamically reflect input stream changes into the classification model, we design the classification model that can be dynamically updated by updating the tokenizer and classifier for new tweet streams. The proposed architecture can provide effective classification for data streams due to the dynamic update and can efficiently process through parallel processing of distributed environments. Through experiments using cyberattack-related tweets, we show that our classification model gradually improves the classification accuracy from 0.8869 when the initial 50,000 tweets are used to 0.9094 when 200,000 tweets are accumulated by F1-score.
KW - Data Ingestion
KW - Distributed Processing
KW - Dynamic Model Update
KW - Event Classification
KW - Streaming Tweets
UR - http://www.scopus.com/inward/record.url?scp=85127607751&partnerID=8YFLogxK
U2 - 10.1109/BigComp54360.2022.00019
DO - 10.1109/BigComp54360.2022.00019
M3 - Conference contribution
AN - SCOPUS:85127607751
T3 - Proceedings - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022
SP - 47
EP - 51
BT - Proceedings - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022
A2 - Unger, Herwig
A2 - Kim, Young-Kuk
A2 - Hwang, Eenjun
A2 - Cho, Sung-Bae
A2 - Pareigis, Stephan
A2 - Kyandoghere, Kyamakya
A2 - Ha, Young-Guk
A2 - Kim, Jinho
A2 - Morishima, Atsuyuki
A2 - Wagner, Christian
A2 - Kwon, Hyuk-Yoon
A2 - Moon, Yang-Sae
A2 - Leung, Carson
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022
Y2 - 17 January 2022 through 20 January 2022
ER -