Abstract
Multimodal emotion recognition is a robust and reliable method as it utilizes multimodal data for more comprehensive representation of emotions. Data fusion is a key step in multimodal emotion recognition, because the accuracy of the recognition model mostly depends on how the different modalities are combined. The goal of this paper is to compare the performances of deep learning (DL) based models for the task of data fusion and multimodal emotion recognition. The contributions of this paper are two folds: 1) We introduce three DL models for multimodal fusion and classification: early fusion, hybrid fusion, and multi-task learning. 2) We systematically compare the performance of these models on three multimodal datasets. Our experimental results demonstrate that multi-task learning achieves the best results across all modalities; 75.41%, 68.33%, and 78.75% for classification of three emotional states using the combinations of audio-visual, EEG-audio, and EEG-visual data, respectively.
Original language | English |
---|---|
Pages (from-to) | 79-87 |
Number of pages | 9 |
Journal | Journal of Korean Institute of Communications and Information Sciences |
Volume | 47 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2022 |
Keywords
- Data-fusion
- deep learning
- EEG
- emotion recognition
- multimodal