citation_keywords: machine learning (ML); emotion classification; Audio emotion recognition; Nerual networks; Speech signal features
citation_publication_date: 2024/03/20
theme-color: #0C4DED
twitter:card: summary_large_image
citation_title: Implementing machine learning techniques for continuous emotion prediction from uniformly segmented voice recordings
citation_author_institution: Center for Lifespan Psychology, Max Planck Institute for Human Development, Germany
keywords: machine learning (ML),emotion classification,Audio emotion recognition,Nerual networks,Speech signal features
citation_publisher: Frontiers
apple-mobile-web-app-title: process.env.SPACE_NAME | Articles
citation_journal_title: Frontiers in Psychology
description: IntroductionEmotional recognition from audio recordings is a rapidly advancing field, with significant implications for artificial intelligence and human-com...
title: Frontiers | Implementing machine learning techniques for continuous emotion prediction from uniformly segmented voice recordings
type: article
citation_online_date: 2024/02/09
citation_issn: 1664-1078
dc:title: Frontiers | Implementing machine learning techniques for continuous emotion prediction from uniformly segmented voice recordings
citation_language: English
Content-Encoding: UTF-8
fb:admins: 1841006843
citation_pdf_url: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2024.1300996/pdf
Content-Type: text/html; charset=UTF-8
image: https://www.frontiersin.org/files/MyHome%20Article%20Library/1300996/1300996_Thumb_400.jpg
X-Parsed-By: org.apache.tika.parser.DefaultParser
citation_journal_abbrev: Front. Psychol.
citation_abstract: IntroductionEmotional recognition from audio recordings is a rapidly advancing field, with significant implications for artificial intelligence and human-computer interaction. This study introduces a novel method for detecting emotions from short, 1.5 s audio samples, aiming to improve accuracy and efficiency in emotion recognition technologies.
MethodsWe utilized 1,510 unique audio samples from two databases in German and English to train our models. We extracted various features for emotion prediction, employing Deep Neural Networks (DNN) for general feature analysis, Convolutional Neural Networks (CNN) for spectrogram analysis, and a hybrid model combining both approaches (C-DNN). The study addressed challenges associated with dataset heterogeneity, language differences, and the complexities of audio sample trimming.
ResultsOur models demonstrated accuracy significantly surpassing random guessing, aligning closely with human evaluative benchmarks. This indicates the effectiveness of our approach in recognizing emotional states from brief audio clips.
DiscussionDespite the challenges of integrating diverse datasets and managing short audio samples, our findings suggest considerable potential for this methodology in real-time emotion detection from continuous speech. This could contribute to improving the emotional intelligence of AI and its applications in various areas.
citation_author: Diemerling, Hannes
url: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2024.1300996/full
site_name: Frontiers
citation_firstpage: 1300996
viewport: width=device-width, initial-scale=1
citation_doi: 10.3389/fpsyg.2024.1300996
mobile-web-app-capable: yes
dc.identifier: doi:10.3389/fpsyg.2024.1300996
citation_volume: 15
Content-Language: en