NAME OF RESEARCHER

Dr Johnny Downs, Senior Clinical Lecturer, Dept of Child and Adolescent Psychiatry

PROJECT TITLE

Piloting development of an automated approach to coding expressed emotion in mothers' speech to improve prediction of youth mental health problems

CLINICAL ACADEMIC GROUP: Child and Adolescent Psychiatry

START & FINISH DATES

1st April 2021-30 June 2022

PROJECT DESCRIPTION

This project brought together social scientists, computer scientists and mental health clinicians to examine the application of automated speech analysis techniques to code expressed emotion in maternal speech samples. We digitized a unique collection of interviews, first recorded in the late 1990’s from the E-Risk Longitudinal Twin Study cohort, capturing conversations with mothers of 2030 children describing their children at age 5. We have now created a very well characterised digital audio archive of these valuable recordings, and have protected them from worsening degradation. We then used the digitized audio files to evaluate whether modern machine learning approaches, which combined automated transcription, natural language coding and acoustic analyses, could code emotional attitudes in the mothers' speech samples. Our final published report described how well the automated coding compared to expert human-rated codings. But beyond the data science work, we used this project to explore how to make the work more approachable, and engage the interest of the public who were interested in mental health research but had little knowledge of computer science.  A creative writer became part of the project team, and explored through interviews and workshops some of the hopes, and concerns, of researchers within the project, young people with lived experience of mental health issues, and clinicians about sharing maternal speech and the risks that machine driven approaches may pose if adopted into routine mental health and social care practice. From this two short near-future fiction stories were created. At the end of pilot project, we ran a public engagement event as part of the ESRC Centre for Society and Mental Health’s festival. At this event, researchers, clinicians, technologists, and members of the public explored together potential future implications of the technologies we were developing as applied to mental health care, with future scenarios brought alive through the short stories read out by the creative writer. This proved to be a great method to engage a non-technical audience and examine some of the ethical, social and practical challenges of using and sharing maternal speech data, and proved very valuable in informing the future approaches we wanted to take in developing the next phase of the project and future grant applications.

PROGRESS IN PAST YEAR:

This has been a very challenging but rewarding project. We successfully curated and digitised the audio tapes of nearly all the 1015 families (~90%) who had completed the five minute speech sample on both their twins. We found that despite the interviews having to follow a fixed structure, with limited scope for interviewer driven conversations, many cases had significant tracks of overlapping speech, with the majority extending beyond 5 minutes and often interchanging between different twins as the subject child. Furthermore, the audio quality was highly variable with frequent inaudible passages and white noise, background chatter, interruptions by young children, and other ambient noises. After multiple attempts to technically mitigate these issues, we conceded that we were not able to use information generated soley by an automatic speech recogniser (ASR) approach. To provide us with the quality of the data we needed, we had to change tack and manually transcribe the interviews. Once this was done we had a corpus of speech and text content of sufficient quality that would enable machine learning based acoustic analyses and automating emotion recognition. However, the significant cost of professional transcription limited the amount of material we were able to provide for training and testing an automated pipeline. Furthermore, the quality of the audio and variation in interviewer styles, meant that even the manual transcription processes, were complicated by speaker alignment errors, time-stamping inaccuracies, missing passages; resulting in multiple and significant revisions to our transcription coding guidance.

We did ultimately create a gold standard training set of 52 transcribed interviews (representing caregiver speech samples for 104 twin children) coded by human raters for EE. From these data, we were able to run a number of experiments using combined acoustic and text features. These helped refine and test which machine learning approaches provided the best predictor of levels of caregiver warmth expressed in Five Minute Speech Samples. Our findings were published and presented as conference proceedings in the main international speech data science conference, Interspeech, which was held in Korea in 2022.

What this research has shown.

In the best performing models, we found that we could automatically predict the degree of warmth in the FMSS samples better than chance with an F1-score of 61.5% (The F1-score is a common metric for assessing the classification performance of a machine learning model using the harmonised mean of positive predictive value and sensitivity).

Implications

This F1-score provided grounds for optimism that an automated approach could be used to identify features of expressed emotion, but highlighted that a much larger gold standard dataset was needed if we were to give our deep-learning/neural net models the best chance to accurately predict EE constructs – ideally with performances >0.8. Overall, our work indicates that automated approaches could be developed for use with ‘real world’ low fidelity audio capture, but further work needed to be done before we could feel confident in advocating that these methodologies could be adopted as low-cost epidemiological tools for EE coding or extension into clinical settings. 

We used the findings from the PRT feasibility work in our grant submission to extend our work. In 2022, we applied to the UKRI Adolescence, Mental Health and the Developing Mind: Methodological Innovation Call. Our application was successful, and our was team awarded £303K to continue to build on the work initially funded by PRT. Please see https://www.ukri.org/wp-content/uploads/2021/08/MRC-081222-SummariesOfMethodologicalInnovationProjects.pdf

PUBLICATIONS & CONFERENCES ATTENDED:

The conference paper was accepted and presented at Interspeech 2022 18-22 September 2022, Incheon, Korea. Published article  

Mirheidari, B., Bittar, A., Cummins, N., Downs, J., Fisher, H.L., Christensen, H. (2022) Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities. Proc. Interspeech 2022, 2458-2462, doi: 10.21437/Interspeech.2022-10188 

Our public engagement workshop event was held in June 2022 [see https://www.kcl.ac.uk/events/ai-predicting-mental-health-possible-futures] 

An extended version of the conference paper was submitted in May 2023, and is currently under review at PLoS One.

 

Previous
Previous

Jessica Sears

Next
Next

Giulia Lombardo