[EN] Abstract of the PhD thesis “Automatic speech recognition and machine translation with deep neural networks for open educational resources, parliamentary contents and broadcast media”, by Gonçal Garcés Díaz-Munío (advisers: Dr Alfons Juan Ciscar and Dr Jorge Civera Saiz)
[CA] Resum de la tesi doctoral “Reconeixement automàtic de la parla i traducció automàtica amb xarxes neuronals profundes per a recursos educatius oberts, continguts parlamentaris i retransmissions audiovisuals”, per Gonçal Garcés Díaz-Munío (directors: Dr. Alfons Juan Ciscar i Dr. Jorge Civera Saiz)
[ES] Resumen de la tesis doctoral “Reconocimiento automático del habla y traducción automática con redes neuronales profundas para recursos educativos abiertos, contenidos parlamentarios y retransmisiones audiovisuales”, por Gonçal Garcés Díaz-Munío (directores: Dr. Alfons Juan Ciscar y Dr. Jorge Civera Saiz)
[FR] Résumé de la thèse de doctorat “Reconnaissance automatique de la parole et traduction automatique à l’aide de réseaux neuronaux profonds pour les ressources éducatives libres, les contenus parlementaires et les émissions audiovisuelles”, par Gonçal Garcés Díaz-Munío (direction: Dr Alfons Juan Ciscar et Dr Jorge Civera Saiz)
Find here the full text of this PhD dissertation
Read here the post about this PhD dissertation’s defence
English
In the last decade, automatic speech recognition (ASR) and machine translation (MT) have improved enormously through the use of constantly evolving deep neural network (DNN) models. If at the beginning of the 2010s the then pre-DNN ASR and MT systems were ready to tackle with success some real-life applications such as offline video lecture transcription and translation, now in the 2020s much more challenging applications are within grasp, such as live broadcast media subtitling.
At the same time in this period, media accessibility for everyone, including deaf and hard-of-hearing people, is being given more and more importance. ASR and MT, in their current state, are powerful tools to increase the coverage of accessibility measures such as subtitles, transcriptions and translations, also as a way of providing multilingual access to all types of content.
In this PhD thesis, we present research results on automatic speech recognition and machine translation based on deep neural networks in three very active domains: open educational resources, parliamentary contents and broadcast media.
Regarding open educational resources (OER), we first present work on the evaluation and post-editing of ASR and MT with intelligent interaction approaches, as carried out in the framework of EU project transLectures: Transcription and Translation of Video Lectures. The results obtained confirm that the intelligent interaction approach can make post-editing automatic transcriptions and translations even more cost-effective. Then, in the context of subsequent EU project X5gon, we present research on developing DNN-based neural MT systems, and making the most of larger
MT corpora through automatic data filtering. This work resulted in a first-rank classification in an international evaluation campaign on MT, and we show how these new NMT systems improved the quality of multilingual subtitles in real OER scenarios.
In the also growing domain of language technologies for parliamentary contents, we describe research on speech data curation techniques for streaming ASR in the context of European Parliament debates. This research resulted in the release of Europarl-ASR, a new, large speech corpus for streaming ASR system training and evaluation, as well as for the benchmarking of speech data curation techniques.
Finally, we present work in a domain on the edge of the state of the art for ASR and MT: the live subtitling of broadcast media, in the context of the 2020–2023 R&D collaboration agreement between the Valencian public broadcaster À Punt and the Universitat Politècnica de València for real-time computer assisted subtitling of media contents. This research has resulted in the deployment of high-quality, low-latency, real-time streaming ASR systems for a less-spoken language (Catalan) and a widely spoken language (Spanish) in a real broadcast use case.
Català
Castellano
Français