Abstract—In this paper, we describe the speech recognition from emotional speech. The task treated in this paper is not an emotion recognition from speech but a speech recognition (speech to text) from a speech that contains distinct emotion. First, we compare two acoustic models trained from neutral speech and emotional speech. We expected that the acoustic model trained from emotional speech improves the recognition performance of emotional speech, but the result showed that a larger amount of neutral speech was more effective than a small amount of emotional speech. Next, we applied the data selection method to enhance the phonetic balance of the training data. As a result, the entropy-based selection from the training data enhanced the recognition performance when there is some domain mismatch between the training data and the evaluation data.
Index Terms—Deep neural network, data selection, emotional speech, speech recognition, training data reduction.
Yusuke Yamada, Yuya Chiba, Takashi Nose are with Graduate School of Engineering, Tohoku University, Sendai, 980-8579 (e-mail: email@example.com, firstname.lastname@example.org, email@example.com).
A. Ito is with Graduate School of Engineering, Tohoku University, and Tough Cyberphysical AI Research Center, Tohoku University, Sendai, 980-8579 (e-mail: firstname.lastname@example.org).
Cite: Yusuke Yamada, Yuya Chiba, Takashi Nose, and Akinori Ito, "Effect of Training Data Selection for Speech Recognition of Emotional Speech," International Journal of Machine Learning and Computing vol. 11, no. 5, pp. 362-366, 2021.Copyright © 2021 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).