Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning
Devraj¹, Ravindra Nath², Nikita Singh³, Vibhushit Katiyar⁴, Amber Srivastava⁵

¹Dr. Devraj, Department of Division of Social Science, ICAR-Indian Institute of Pulses Research, Kanpur (Uttar Pradesh), India.

²Dr. Ravindra Nath, Associate Professor, Department of Computer Science, Babasaheb Bhimrao Ambedkar Central University, Lucknow (Uttar Pradesh), India.

³Nikita Singh, Department of Computer Centre, ICAR-Indian Institute of Pulses Research, Kanpur (Uttar Pradesh), India.

⁴Vibhushit Katiyar, Student, Department of Computer Science, B.Tech.(CS) Student, Lovely Professional University, Jalandhar (Panjab), India.

⁵Amber Srivastava, Student, Department of Computer Science, B.Tech.(CS) Student, Lovely Professional University, Jalandhar (Panjab), India.

Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Published by Lattice Science Publication (LSP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Speech Emotion Recognition (SER) has emerged as a significant research area within Human–Computer Interaction (HCI), enabling intelligent systems to interpret human emotional states from spoken audio. Accurate emotion recognition from speech plays a crucial role in enhancing natural interaction between humans and machines. This paper presents a deep learning–based SER framework that combines Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction with Long Short-Term Memory (LSTM) networks for temporal modelling and emotion classification. MFCC features effectively capture the spectral characteristics of speech signals, whereas LSTM networks are well-suited to modelling long-term temporal dependencies inherent in emotional speech patterns. The proposed model is trained and evaluated on the Toronto Emotional Speech Set (TESS) dataset, which covers multiple emotional categories, including happiness, sadness, anger, fear, and neutrality. Experimental results demonstrate that the proposed MFCC–LSTM approach achieves promising classification accuracy, indicating its effectiveness in recognising emotional states from speech signals. The findings highlight the potential applicability of the proposed system in real-world scenarios, including virtual assistants, call centre analytics, and mental health monitoring systems, thereby contributing to the development of emotion-aware intelligent interfaces.

Keywords: Speech Emotion Recognition, MFCC, LSTM, Deep Learning, TESS Dataset, Human–Computer Interaction, Audio Signal Processing.
Scope of the Article: Audio Signal Processing

Download PDF

JOURNAL

REQUIREMENTS

PRODUCT

CONTACT US

A101706010226

Share this entry

JOURNAL

REQUIREMENTS

PRODUCT

CONTACT US