Effect of Training Data Selection for Speech Recognition of Emotional Speech

Home > Archive > 2021 > Volume 11 Number 5 (Sept. 2021) >

IJMLC 2021 Vol.11(5): 362-366 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2021.11.5.1062

Yusuke Yamada, Yuya Chiba, Takashi Nose, and Akinori Ito

Abstract—In this paper, we describe the speech recognition from emotional speech. The task treated in this paper is not an emotion recognition from speech but a speech recognition (speech to text) from a speech that contains distinct emotion. First, we compare two acoustic models trained from neutral speech and emotional speech. We expected that the acoustic model trained from emotional speech improves the recognition performance of emotional speech, but the result showed that a larger amount of neutral speech was more effective than a small amount of emotional speech. Next, we applied the data selection method to enhance the phonetic balance of the training data. As a result, the entropy-based selection from the training data enhanced the recognition performance when there is some domain mismatch between the training data and the evaluation data.

Index Terms—Deep neural network, data selection, emotional speech, speech recognition, training data reduction.

Yusuke Yamada, Yuya Chiba, Takashi Nose are with Graduate School of Engineering, Tohoku University, Sendai, 980-8579 (e-mail: y.yamada@spcom.ecei.tohoku.ac.jp, yuya@spcom.ecei.tohoku.ac.jp, nose@tohoku.ac.jp).
A. Ito is with Graduate School of Engineering, Tohoku University, and Tough Cyberphysical AI Research Center, Tohoku University, Sendai, 980-8579 (e-mail: aito@spcom.ecei.tohoku.ac.jp).

[PDF]

Cite: Yusuke Yamada, Yuya Chiba, Takashi Nose, and Akinori Ito, "Effect of Training Data Selection for Speech Recognition of Emotional Speech," International Journal of Machine Learning and Computing vol. 11, no. 5, pp. 362-366, 2021.

Copyright © 2021 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

PREVIOUS PAPER

Noise Reduction Using Neural Lateral Inhibition for Speech Enhancement

NEXT PAPER

Using Word Embeddings in Turkish Part of Speech Tagging

General Information

E-ISSN: 2972-368X
Abbreviated Title: Int. J. Mach. Learn.
Frequency: Quaterly
DOI: 10.18178/IJML
Editor-in-Chief: Dr. Lin Huang
Executive Editor: Ms. Cherry L. Chen
Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library, CNKI.
E-mail: ijml@ejournal.net

Home

About IJML

Editorial Board

Author Guideline

Editor Guideline

Reviewer Guideline

Special Issues

Archive

Home > Archive > 2021 > Volume 11 Number 5 (Sept. 2021) >

Effect of Training Data Selection for Speech Recognition of Emotional Speech

General Information

Article Metrics in Dimensions