Comparative Analysis of Speech Emotion Recognition System Using MLP, SVM, and CNN Algorithms
Main Article Content
Abstract
Emotion recognition from speech plays a crucial role in enhancing human–computer interaction by enabling systems to interpret and respond to users’ emotional states. This study develops and evaluates a Speech Emotion Recognition (SER) system using three machine learning techniques; Support Vector Machines (SVM), Multilayer Perceptron (MLP), and Convolutional Neural Networks (CNNs). The system is trained and tested on the RAVDESS dataset, which contains 1,440 professionally recorded audio samples representing a wide range of emotions. Our approach involves careful preprocessing of the audio signals, extraction of key acoustic features, and comparative performance evaluation of the three models using standard metrics. Results show that each model exhibits unique strengths and limitations, with CNNs achieving the most robust feature learning and generalization. The study underscores the importance of diverse feature representation for accurate emotion classification and provides insight into how different model architectures handle emotional nuances in speech. Identified challenges such as dataset diversity, feature selection, and computational complexity are discussed, along with recommendations for future research to improve SER systems’ real-world adaptability. This work contributes to ongoing efforts toward developing emotionally aware technologies that can enhance natural human–machine communication.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Share – Copy and redistribute the materials in any medium or format.
Adapt – Remix, transform, and build up the materials.
How to Cite
References
Abeer, S., Raza, H., & Qamar, U. (2019). Speech
emotion recognition using deep convolutional
neural networks. Procedia Computer Science,
, 407–414.
https://doi.org/10.1016/j.procs.2019.05.055
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier,
W. F., & Weiss, B. (2005). A database of
German emotional speech. In Proceedings of
Interspeech (pp. 1517–1520).
Cao, R., Verma, A., & Nenkova, H. (2019). Speakersensitive emotion recognition via ranking:
Studies on acted and spontaneous speech.
Computer Speech & Language, 28(1), 186–202.
https://doi.org/10.1016/j.csl.2013.06.002
Chakraborty, I., & Sharma, G. (2018). Speech emotion
recognition: A comparative analysis of datasets
and features. Journal of King Saud University -
Computer and Information Sciences, 30(3), 1–
https://doi.org/10.1016/j.jksuci.2016.12.004
Chen, X., Mao, Y., Xue, L., & Cheng, L. L. (2020).
Speech emotion recognition: Features and
classification models. Digital Signal Processing,
(6), 1154–1160.
https://doi.org/10.1016/j.dsp.2012.09.006
Cheng, P., Zhang, G., Schuller, B., & Zafeiriou, S.
(2019). End-to-end speech emotion recognition
using deep neural networks. IEEE Journal of
Selected Topics in Signal Processing, 11(8),
–1307.
https://doi.org/10.1109/JSTSP.2019.2952335
Dai, W., Han, D., Dai, Y., & Xu, D. (2020). Emotion
recognition and affective computing on vocal
social media. Information & Management,
(5), 103223.
https://doi.org/10.1016/j.im.2019.103223
Loan, R. (2018). Emotional speech recognition using
deep neural networks. Cognitive Computation,
(3), 448–460. https://doi.org/10.1007/s12559-
-9554-2
Loan, Y., Cao, W., Zhang, Z., & Wang, D. (2019). A
hybrid model based on CNN and BiLSTM for
speech emotion recognition. IEEE Access, 9,
–27107.
https://doi.org/10.1109/ACCESS.2019.2897260
Nwe, S. W., Foo, L. C., & De Silva, T. L. (2020).
Speech emotion recognition using hidden
Markov models. Speech Communication, 41(4),
–623. https://doi.org/10.1016/S0167-
(03)00099-2
Okomba, S., Adegboye, M., & Candidus, O. (2019).
Survey of technical progress in speech
recognition over recent years. Computer
Engineering Journal, 15(2), 407–414.
Omodunbi, B. A., Soladoye, A. A., Olaniyan, O. M.,
Salami, A. I., & Olagunju, A. I. (2023). Facial
emotion-based song suggestion system using
convolutional neural network. International
Journal of Advanced Computer Science and
Applications, 14(5), 1–10.
https://doi.org/10.14569/IJACSA.2023.0140501
Schuller, B., Steidl, S., & Batliner, A. (2009). The
INTERSPEECH 2009 emotion challenge. In
Proceedings of Interspeech (pp. 312–315).
Wu, C.-H., & Liang, W.-B. (2019). Emotion
recognition of affective speech based on
multiple classifiers using acoustic-prosodic
information and semantic labels. In Proceedings
of the International Conference on Affective
Computing and Intelligent Interaction (pp. 1–6).
Tainan, Taiwan.
Wu, S., Falk, T. H., & Chan, W.-Y. (2020). Automatic
speech emotion recognition using modulation
spectral features. Speech Communication, 53(5),
–785.
https://doi.org/10.1016/j.specom.2010.02.009
Yu, F., Zhang, L., & Li, H. (2021). Emotion detection
from speech to enrich multimedia content. In
Proceedings of the Pacific-Rim Conference on
Multimedia (PCM) (pp. 1–10). Springer, Berlin,
Heidelberg. https://doi.org/10.1007/978-3-642-
-0_8
Yule, Y. H., & Hsu, W. H. (2019). An efficient speech
emotion recognition system using a hybrid
model. IEEE Transactions on Audio, Speech,
and Language Processing, 21(12), 2570–2580.
https://doi.org/10.1109/TASL.2013.2278904
Environmental Technology & Science Journal
Volume 16 Number 2 December 2025
Zhang, Y., Zhang, D., Wang, S., & Liu, Z. (2019).
Speech emotion recognition using 1D
convolutional neural networks. In Proceedings
of the 2019 International Joint Conference on
Neural Networks (IJCNN) (pp. 1–8).