Abstract
This paper presents a deep learning-based non-intrusive speech intelligibility estimation method using bottleneck features of autoencoder. The conventional standard non-intrusive speech intelligibility estimation method, P.563, lacks intelligibility estimation performance in various noise environments. We propose a more accurate speech intelligibility estimation method based on long-short term memory (LSTM) neural network whose input and output are an autoencoder bottleneck features and a short-time objective intelligence (STOI) score, respectively, where STOI is a standard tool for measuring intrusive speech intelligibility with reference speech signals. We showed that the proposed method has a superior performance by comparing with the conventional standard P.563 and mel-frequency cepstral coefficient (MFCC) feature-based intelligibility estimation methods for speech signals in various noise environments.
| Original language | English |
|---|---|
| Pages (from-to) | 714-715 |
| Number of pages | 2 |
| Journal | IEICE Transactions on Information and Systems |
| Volume | E103D |
| Issue number | 3 |
| DOIs | |
| State | Published - 2020 |
Keywords
- Autoencoder
- Bottleneck feature
- Deep learning
- Long short-term memory (LSTM)
- STOI