Comparative Analysis of Speech Emotion Recognition System Using MLP, SVM, and CNN Algorithms

Omodunbi B.A; Awoyemi T.A; Esan A.O.

PDF

Published: May 18, 2026

Keywords:

Speech, speech recognition, pre-processing, evaluation techniques, communication

Omodunbi B.A

Department of Computer Engineering, Federal University, Oye-Ekiti

Awoyemi T.A

Department of Computer Engineering, Federal University, Oye-Ekiti

Esan A.O.

Department of Computer Engineering, Federal University, Oye-Ekiti

Abstract

Emotion recognition from speech plays a crucial role in enhancing human–computer interaction by enabling systems to interpret and respond to users’ emotional states. This study develops and evaluates a Speech Emotion Recognition (SER) system using three machine learning techniques; Support Vector Machines (SVM), Multilayer Perceptron (MLP), and Convolutional Neural Networks (CNNs). The system is trained and tested on the RAVDESS dataset, which contains 1,440 professionally recorded audio samples representing a wide range of emotions. Our approach involves careful preprocessing of the audio signals, extraction of key acoustic features, and comparative performance evaluation of the three models using standard metrics. Results show that each model exhibits unique strengths and limitations, with CNNs achieving the most robust feature learning and generalization. The study underscores the importance of diverse feature representation for accurate emotion classification and provides insight into how different model architectures handle emotional nuances in speech. Identified challenges such as dataset diversity, feature selection, and computational complexity are discussed, along with recommendations for future research to improve SER systems’ real-world adaptability. This work contributes to ongoing efforts toward developing emotionally aware technologies that can enhance natural human–machine communication.

Downloads

Download data is not yet available.

How to Cite

Comparative Analysis of Speech Emotion Recognition System Using MLP, SVM, and CNN Algorithms. (2026). Environmental Technology & Science Journal, 16(2), 107-113. https://journal.futminna.edu.ng/index.php/etsj/article/view/214

Issue

Vol. 16 No. 2 (2025): Environmental Technology Sciences Journal

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Share – Copy and redistribute the materials in any medium or format.

Adapt – Remix, transform, and build up the materials.

How to Cite

Comparative Analysis of Speech Emotion Recognition System Using MLP, SVM, and CNN Algorithms. (2026). Environmental Technology & Science Journal, 16(2), 107-113. https://journal.futminna.edu.ng/index.php/etsj/article/view/214

Download Citation

References

Abeer, S., Raza, H., & Qamar, U. (2019). Speech

emotion recognition using deep convolutional

neural networks. Procedia Computer Science,

, 407–414.

https://doi.org/10.1016/j.procs.2019.05.055

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier,

W. F., & Weiss, B. (2005). A database of

German emotional speech. In Proceedings of

Interspeech (pp. 1517–1520).

Cao, R., Verma, A., & Nenkova, H. (2019). Speakersensitive emotion recognition via ranking:

Studies on acted and spontaneous speech.

Computer Speech & Language, 28(1), 186–202.

https://doi.org/10.1016/j.csl.2013.06.002

Chakraborty, I., & Sharma, G. (2018). Speech emotion

recognition: A comparative analysis of datasets

and features. Journal of King Saud University -

Computer and Information Sciences, 30(3), 1–

https://doi.org/10.1016/j.jksuci.2016.12.004

Chen, X., Mao, Y., Xue, L., & Cheng, L. L. (2020).

Speech emotion recognition: Features and

classification models. Digital Signal Processing,

(6), 1154–1160.

https://doi.org/10.1016/j.dsp.2012.09.006

Cheng, P., Zhang, G., Schuller, B., & Zafeiriou, S.

(2019). End-to-end speech emotion recognition

using deep neural networks. IEEE Journal of

Selected Topics in Signal Processing, 11(8),

–1307.

https://doi.org/10.1109/JSTSP.2019.2952335

Dai, W., Han, D., Dai, Y., & Xu, D. (2020). Emotion

recognition and affective computing on vocal

social media. Information & Management,

(5), 103223.

https://doi.org/10.1016/j.im.2019.103223

Loan, R. (2018). Emotional speech recognition using

deep neural networks. Cognitive Computation,

(3), 448–460. https://doi.org/10.1007/s12559-

-9554-2

Loan, Y., Cao, W., Zhang, Z., & Wang, D. (2019). A

hybrid model based on CNN and BiLSTM for

speech emotion recognition. IEEE Access, 9,

–27107.

https://doi.org/10.1109/ACCESS.2019.2897260

Nwe, S. W., Foo, L. C., & De Silva, T. L. (2020).

Speech emotion recognition using hidden

Markov models. Speech Communication, 41(4),

–623. https://doi.org/10.1016/S0167-

(03)00099-2

Okomba, S., Adegboye, M., & Candidus, O. (2019).

Survey of technical progress in speech

recognition over recent years. Computer

Engineering Journal, 15(2), 407–414.

Omodunbi, B. A., Soladoye, A. A., Olaniyan, O. M.,

Salami, A. I., & Olagunju, A. I. (2023). Facial

emotion-based song suggestion system using

convolutional neural network. International

Journal of Advanced Computer Science and

Applications, 14(5), 1–10.

https://doi.org/10.14569/IJACSA.2023.0140501

Schuller, B., Steidl, S., & Batliner, A. (2009). The

INTERSPEECH 2009 emotion challenge. In

Proceedings of Interspeech (pp. 312–315).

Wu, C.-H., & Liang, W.-B. (2019). Emotion

recognition of affective speech based on

multiple classifiers using acoustic-prosodic

information and semantic labels. In Proceedings

of the International Conference on Affective

Computing and Intelligent Interaction (pp. 1–6).

Tainan, Taiwan.

Wu, S., Falk, T. H., & Chan, W.-Y. (2020). Automatic

speech emotion recognition using modulation

spectral features. Speech Communication, 53(5),

–785.

https://doi.org/10.1016/j.specom.2010.02.009

Yu, F., Zhang, L., & Li, H. (2021). Emotion detection

from speech to enrich multimedia content. In

Proceedings of the Pacific-Rim Conference on

Multimedia (PCM) (pp. 1–10). Springer, Berlin,

Heidelberg. https://doi.org/10.1007/978-3-642-

-0_8

Yule, Y. H., & Hsu, W. H. (2019). An efficient speech

emotion recognition system using a hybrid

model. IEEE Transactions on Audio, Speech,

and Language Processing, 21(12), 2570–2580.

https://doi.org/10.1109/TASL.2013.2278904

Environmental Technology & Science Journal

Volume 16 Number 2 December 2025

Zhang, Y., Zhang, D., Wang, S., & Liu, Z. (2019).

Speech emotion recognition using 1D

convolutional neural networks. In Proceedings

of the 2019 International Joint Conference on

Neural Networks (IJCNN) (pp. 1–8).

https://doi.org/10.1109/IJCNN.2019.8852014

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

How to Cite

References

Most read articles by the same author(s)