Publications

del Agua Teba, Miguel Á

Contributions to Efficient Automatic Transcription of Video Lectures PhD Thesis

Universitat Politècnica de València, 2019, (Advisers: Alfons Juan Ciscar and Albert Sanchis Navarro).

Links | BibTeX | Tags: Automatic Speech Recognition, Confidence measures, Video Lectures

Del-Agua, Miguel Ángel ; Giménez, Adrià ; Sanchis, Alberto ; Civera, Jorge; Juan, Alfons

Speaker-Adapted Confidence Measures for ASR using Deep Bidirectional Recurrent Neural Networks Journal Article

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26 (7), pp. 1194–1202, 2018.

Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Confidence estimation, Confidence measures, Deep bidirectional recurrent neural networks, Long short-term memory, Speaker adaptation

@article{Del-Agua2018,
title = {Speaker-Adapted Confidence Measures for ASR using Deep Bidirectional Recurrent Neural Networks},
author = {Del-Agua, Miguel Ángel AND Giménez, Adrià AND Sanchis, Alberto AND Civera,Jorge AND Juan, Alfons},
url = {http://www.mllp.upv.es/wp-content/uploads/2018/04/Del-Agua2018_authors_version.pdf
https://doi.org/10.1109/TASLP.2018.2819900},
year = {2018},
date = {2018-01-01},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
volume = {26},
number = {7},
pages = {1194--1202},
abstract = {In the last years, Deep Bidirectional Recurrent Neural Networks (DBRNN) and DBRNN with Long Short-Term Memory cells (DBLSTM) have outperformed the most accurate classifiers for confidence estimation in automatic speech recognition. At the same time, we have recently shown that speaker adaptation of confidence measures using DBLSTM yields significant improvements over non-adapted confidence measures. In accordance with these two recent contributions to the state of the art in confidence estimation, this paper presents a comprehensive study of speaker-adapted confidence measures using DBRNN and DBLSTM models. Firstly, we present new empirical evidences of the superiority of RNN-based confidence classifiers evaluated over a large speech corpus consisting of the English LibriSpeech and the Spanish poliMedia tasks. Secondly, we show new results on speaker-adapted confidence measures considering a multi-task framework in which RNN-based confidence classifiers trained with LibriSpeech are adapted to speakers of the TED-LIUM corpus. These experiments confirm that speaker-adapted confidence measures outperform their non-adapted counterparts. Lastly, we describe an unsupervised adaptation method of the acoustic DBLSTM model based on confidence measures which results in better automatic speech recognition performance.},
keywords = {Automatic Speech Recognition, Confidence estimation, Confidence measures, Deep bidirectional recurrent neural networks, Long short-term memory, Speaker adaptation},
pubstate = {published},
tppubtype = {article}
}

Close

Silvestre-Cerdà, Joan Albert ; Del Agua, Miguel ; Garcés, Gonçal; Gascó, Guillem; Giménez-Pastor, Adrià; Martínez, Adrià; Pérez González de Martos, Alejandro ; Sánchez, Isaías; Serrano Martínez-Santos, Nicolás ; Spencer, Rachel; Valor Miró, Juan Daniel ; Andrés-Ferrer, Jesús; Civera, Jorge; Sanchís, Alberto; Juan, Alfons

transLectures Inproceedings

Proceedings (Online) of IberSPEECH 2012, pp. 345–351, Madrid (Spain), 2012.

Abstract | Links | BibTeX | Tags: Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures

@inproceedings{Silvestre-Cerdà2012b,
title = {transLectures},
author = {Silvestre-Cerdà, Joan Albert and Del Agua, Miguel and Gonçal Garcés and Guillem Gascó and Adrià Giménez-Pastor and Adrià Martínez and Pérez González de Martos, Alejandro and Isaías Sánchez and Serrano Martínez-Santos, Nicolás and Rachel Spencer and Valor Miró, Juan Daniel and Jesús Andrés-Ferrer and Jorge Civera and Alberto Sanchís and Alfons Juan},
url = {http://hdl.handle.net/10251/37290
http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VII/IberSPEECH2012_OnlineProceedings.pdf
https://web.archive.org/web/20130609073144/http://iberspeech2012.ii.uam.es/IberSPEECH2012_OnlineProceedings.pdf
http://www.mllp.upv.es/wp-content/uploads/2015/04/1209IberSpeech.pdf},
year = {2012},
date = {2012-11-22},
booktitle = {Proceedings (Online) of IberSPEECH 2012},
pages = {345--351},
address = {Madrid (Spain)},
abstract = {[EN] transLectures (Transcription and Translation of Video Lectures) is an EU STREP project in which advanced automatic speech recognition and machine translation techniques are being tested on large video lecture repositories. The project began in November 2011 and will run for three years. This paper will outline the project's main motivation and objectives, and give a brief description of the two main repositories being considered: VideoLectures.NET and poliMèdia. The first results obtained by the UPV group for the poliMedia repository will also be provided.

[CA] transLectures (Transcription and Translation of Video Lectures) és un projecte del 7PM de la Unió Europea en el qual s'estan posant a prova tècniques avançades de reconeixement automàtic de la parla i de traducció automàtica sobre grans repositoris digitals de vídeos docents. El projecte començà al novembre de 2011 i tindrà una duració de tres anys. En aquest article exposem la motivació i els objectius del projecte, i descrivim breument els dos repositoris principals sobre els quals es treballa: VideoLectures.NET i poliMèdia. També oferim els primers resultats obtinguts per l'equip de la UPV al repositori poliMèdia.},
keywords = {Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures},
pubstate = {published},
tppubtype = {inproceedings}
}

Close