The article “Speaker-Adapted Confidence Measures for ASR using Deep Bidirectional Recurrent Neural Networks“, by Miguel Ángel del Agua Teba and other MLLP members, has been published as part of the July 2018 issue of the journal IEEE/ACM Transactions on Audio, Speech, and Language Processing.
The article’s authors, MLLP members Miguel Ángel del Agua Teba, Adrià Giménez, Albert Sanchis, Jorge Civera and Alfons Juan, have summarized it in the following abstract:
In the last years, Deep Bidirectional Recurrent Neural Networks (DBRNN) and DBRNN with Long Short-Term Memory cells (DBLSTM) have outperformed the most accurate classifiers for confidence estimation in automatic speech recognition. At the same time, we have recently shown that speaker adaptation of confidence measures using DBLSTM yields significant improvements over non-adapted confidence measures. In accordance with these two recent contributions to the state of the art in confidence estimation, this paper presents a comprehensive study of speaker-adapted confidence measures using DBRNN and DBLSTM models. Firstly, we present new empirical evidence of the superiority of RNN-based confidence classifiers evaluated over a large speech corpus consisting of the English LibriSpeech and the Spanish poliMedia tasks. Secondly, we show new results on speaker-adapted confidence measures considering a multi-task framework in which RNN-based confidence classifiers trained with LibriSpeech are adapted to speakers of the TED-LIUM corpus. These experiments confirm that speaker-adapted confidence measures outperform their non-adapted counterparts. Lastly, we describe an unsupervised adaptation method of the acoustic DBLSTM model based on confidence measures which results in better automatic speech recognition performance.
The journal IEEE/ACM Transactions on Audio, Speech, and Language Processing, published by the IEEE Signal Processing Society, covers audio, speech and language processing and the sciences that support them, also welcoming machine learning and pattern analysis applied to any of the above areas. With a JCR 2016 impact factor of 2.491 (5-year: 2.501), it’s a Q1 journal in the category of Acoustics, and a Q2 journal in the field of Electrical and Electronic Engineering.
This new article follows and expands on a line of publications by Miguel Ángel del Agua Teba on language processing and confidence measures, including Interspeech 2016’s “ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks“, IWSLT 2015’s “The MLLP ASR Systems for IWSLT 2015“, and ICASSP 2013’s “Language model adaptation for video lectures transcription“.
In the last six years (2012-2018), MLLP members have published 15 international journal articles (IEEE-ACM Trans. Audio Speech Lang., 2018; Pattern Recognition Letters, 51, 2015; Pattern Recognition, 45 (9), 2012; …) and over 30 international conference papers (Interspeech 2016; AMTA 2014; ICASSP 2013; EACL 2012; …). You can browse through all of the group’s 200+ publications (since 1991) in the Publications section in our website.