TY - JOUR
T1 - Attention-based speech feature transfer between speakers
AU - Lee, Hangbok
AU - Cho, Minjae
AU - Kwon, Hyuk Yoon
N1 - Publisher Copyright:
Copyright © 2024 Lee, Cho and Kwon.
PY - 2024
Y1 - 2024
N2 - In this study, we propose a simple yet effective method for incorporating the source speaker's characteristics in the target speaker's speech. This allows our model to generate the speech of the target speaker with the style of the source speaker. To achieve this, we focus on the attention model within the speech synthesis model, which learns various speaker features such as spectrogram, pitch, intensity, formant, pulse, and voice breaks. The model is trained separately using datasets specific to the source and target speakers. Subsequently, we replace the attention weights learned from the source speaker's dataset with the attention weights from the target speaker's model. Finally, by providing new input texts to the target model, we generate the speech of the target speaker with the styles of the source speaker. We validate the effectiveness of our model through similarity analysis utilizing five evaluation metrics and showcase real-world examples.
AB - In this study, we propose a simple yet effective method for incorporating the source speaker's characteristics in the target speaker's speech. This allows our model to generate the speech of the target speaker with the style of the source speaker. To achieve this, we focus on the attention model within the speech synthesis model, which learns various speaker features such as spectrogram, pitch, intensity, formant, pulse, and voice breaks. The model is trained separately using datasets specific to the source and target speakers. Subsequently, we replace the attention weights learned from the source speaker's dataset with the attention weights from the target speaker's model. Finally, by providing new input texts to the target model, we generate the speech of the target speaker with the styles of the source speaker. We validate the effectiveness of our model through similarity analysis utilizing five evaluation metrics and showcase real-world examples.
KW - attention mechanism
KW - feature transfer
KW - speech features
KW - speech similarity
KW - speech synthesis
UR - https://www.scopus.com/pages/publications/85187885996
U2 - 10.3389/frai.2024.1259641
DO - 10.3389/frai.2024.1259641
M3 - Article
AN - SCOPUS:85187885996
SN - 2624-8212
VL - 7
JO - Frontiers in Artificial Intelligence
JF - Frontiers in Artificial Intelligence
M1 - 1259641
ER -