Attention-based speech feature transfer between speakers

Hangbok Lee, Minjae Cho, Hyuk Yoon Kwon

Research output: Contribution to journalArticlepeer-review

Abstract

In this study, we propose a simple yet effective method for incorporating the source speaker's characteristics in the target speaker's speech. This allows our model to generate the speech of the target speaker with the style of the source speaker. To achieve this, we focus on the attention model within the speech synthesis model, which learns various speaker features such as spectrogram, pitch, intensity, formant, pulse, and voice breaks. The model is trained separately using datasets specific to the source and target speakers. Subsequently, we replace the attention weights learned from the source speaker's dataset with the attention weights from the target speaker's model. Finally, by providing new input texts to the target model, we generate the speech of the target speaker with the styles of the source speaker. We validate the effectiveness of our model through similarity analysis utilizing five evaluation metrics and showcase real-world examples.

Original languageEnglish
Article number1259641
JournalFrontiers in Artificial Intelligence
Volume7
DOIs
StatePublished - 2024

Keywords

  • attention mechanism
  • feature transfer
  • speech features
  • speech similarity
  • speech synthesis

Fingerprint

Dive into the research topics of 'Attention-based speech feature transfer between speakers'. Together they form a unique fingerprint.

Cite this