Analysis of the Mel Scale Features Using Classification of Big Data and Speech Signals

Author(s): Volodymyr Osadchyy, Ruslan V. Skuratovskii, Aled Williams

Abstract: The role of human speech is intensified by the emotion it conveys. The parameterization of the vector obtained from the sentence divided into the containing emotional-informational part and the informational part is effectively applied. There are several characteristics and features of speech that differentiate it among utterances, i.e. various prosodic features like pitch, timbre, loudness and vocal tone which categorize speech into several emotions. They were supplemented by us with a new classification feature of speech, which consists in dividing a sentence into an emotionally loaded part of the sentence and a part that carries only informational load. Therefore, the sample speech is changed when it is subjected to various emotional environments. As the identification of the speaker’s emotional states can be done based on the Mel scale, MFCC is one such variant to study the emotional aspects of a speaker’s utterances. In this work, we implement a model to identify several emotional states from MFCC for two datasets, classify emotions for them on the basis of MFCC features and give the comparison of both. Overall, this work implements the classification model based on dataset minimization that is done by taking the mean of features for the improvement of the classification accuracy rate in different machine learning algorithms.

Keywords: Machine Learning; Speech Recognition; Emotion recognition; MFCC; supervised learning; decision trees.

Pages: 52-63


