2024 |
Garcés Díaz-Munío, Gonçal Universitat Politècnica de València, 2024, (advisers: Alfons Juan Ciscar and Jorge Civera Saiz). Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Broadcast Media, Deep Neural Networks, Machine Translation, Open Educational Resources, Parliamentary Contents @phdthesis{Garcés2024, title = {Automatic speech recognition and machine translation with deep neural networks for open educational resources, parliamentary contents and broadcast media}, author = {Garcés Díaz-Munío, Gonçal}, url = {http://hdl.handle.net/10251/212454 https://www.upv.es/pls/oalu/sic_ted.mostrar_tesis?p_num_reg=12900 https://github.com/gonsalet/ASR_and_MT_for_educational_parliamentary_and_broadcast_media}, doi = {10.4995/Thesis/10251/212454}, year = {2024}, date = {2024-11-25}, school = {Universitat Politècnica de València}, abstract = {In the last decade, automatic speech recognition (ASR) and machine translation (MT) have improved enormously through the use of constantly evolving deep neural network (DNN) models. If at the beginning of the 2010s the then pre-DNN ASR and MT systems were ready to tackle with success some real-life applications such as offline video lecture transcription and translation, now in the 2020s much more challenging applications are within grasp, such as live broadcast media subtitling. At the same time in this period, media accessibility for everyone, including deaf and hard-of-hearing people, is being given more and more importance. ASR and MT, in their current state, are powerful tools to increase the coverage of accessibility measures such as subtitles, transcriptions and translations, also as a way of providing multilingual access to all types of content. In this PhD thesis, we present research results on automatic speech recognition and machine translation based on deep neural networks in three very active domains: open educational resources, parliamentary contents and broadcast media. Regarding open educational resources (OER), we first present work on the evaluation and post-editing of ASR and MT with intelligent interaction approaches, as carried out in the framework of EU project transLectures: Transcription and Translation of Video Lectures. The results obtained confirm that the intelligent interaction approach can make post-editing automatic transcriptions and translations even more cost-effective. Then, in the context of subsequent EU project X5gon, we present research on developing DNN-based neural MT systems, and making the most of larger MT corpora through automatic data filtering. This work resulted in a first-rank classification in an international evaluation campaign on MT, and we show how these new NMT systems improved the quality of multilingual subtitles in real OER scenarios. In the also growing domain of language technologies for parliamentary contents, we describe research on speech data curation techniques for streaming ASR in the context of European Parliament debates. This research resulted in the release of Europarl-ASR, a new, large speech corpus for streaming ASR system training and evaluation, as well as for the benchmarking of speech data curation techniques. Finally, we present work in a domain on the edge of the state of the art for ASR and MT: the live subtitling of broadcast media, in the context of the 2020–2023 R&D collaboration agreement between the Valencian public broadcaster À Punt and the Universitat Politècnica de València for real-time computer assisted subtitling of media contents. This research has resulted in the deployment of high-quality, low-latency, real-time streaming ASR systems for a less-spoken language (Catalan) and a widely spoken language (Spanish) in a real broadcast use case.}, note = {advisers: Alfons Juan Ciscar and Jorge Civera Saiz}, keywords = {Automatic Speech Recognition, Broadcast Media, Deep Neural Networks, Machine Translation, Open Educational Resources, Parliamentary Contents}, pubstate = {published}, tppubtype = {phdthesis} } In the last decade, automatic speech recognition (ASR) and machine translation (MT) have improved enormously through the use of constantly evolving deep neural network (DNN) models. If at the beginning of the 2010s the then pre-DNN ASR and MT systems were ready to tackle with success some real-life applications such as offline video lecture transcription and translation, now in the 2020s much more challenging applications are within grasp, such as live broadcast media subtitling. At the same time in this period, media accessibility for everyone, including deaf and hard-of-hearing people, is being given more and more importance. ASR and MT, in their current state, are powerful tools to increase the coverage of accessibility measures such as subtitles, transcriptions and translations, also as a way of providing multilingual access to all types of content. In this PhD thesis, we present research results on automatic speech recognition and machine translation based on deep neural networks in three very active domains: open educational resources, parliamentary contents and broadcast media. Regarding open educational resources (OER), we first present work on the evaluation and post-editing of ASR and MT with intelligent interaction approaches, as carried out in the framework of EU project transLectures: Transcription and Translation of Video Lectures. The results obtained confirm that the intelligent interaction approach can make post-editing automatic transcriptions and translations even more cost-effective. Then, in the context of subsequent EU project X5gon, we present research on developing DNN-based neural MT systems, and making the most of larger MT corpora through automatic data filtering. This work resulted in a first-rank classification in an international evaluation campaign on MT, and we show how these new NMT systems improved the quality of multilingual subtitles in real OER scenarios. In the also growing domain of language technologies for parliamentary contents, we describe research on speech data curation techniques for streaming ASR in the context of European Parliament debates. This research resulted in the release of Europarl-ASR, a new, large speech corpus for streaming ASR system training and evaluation, as well as for the benchmarking of speech data curation techniques. Finally, we present work in a domain on the edge of the state of the art for ASR and MT: the live subtitling of broadcast media, in the context of the 2020–2023 R&D collaboration agreement between the Valencian public broadcaster À Punt and the Universitat Politècnica de València for real-time computer assisted subtitling of media contents. This research has resulted in the deployment of high-quality, low-latency, real-time streaming ASR systems for a less-spoken language (Catalan) and a widely spoken language (Spanish) in a real broadcast use case. |
2022 |
Pérez González de Martos, Alejandro ; Giménez Pastor, Adrià ; Jorge Cano, Javier ; Iranzo-Sánchez, Javier; Silvestre-Cerdà, Joan Albert; Garcés Díaz-Munío, Gonçal V; Baquero-Arnal, Pau; Sanchis Navarro, Alberto ; Civera Sáiz, Jorge ; Juan Ciscar, Alfons ; Turró Ribalta, Carlos Doblaje automático de vídeo-charlas educativas en UPV[Media] Inproceedings Proc. of VIII Congrés d'Innovació Educativa i Docència en Xarxa (IN-RED 2022), pp. 557–570, València (Spain), 2022. Abstract | Links | BibTeX | Tags: automatic dubbing, Automatic Speech Recognition, Machine Translation, OER, text-to-speech @inproceedings{deMartos2022, title = {Doblaje automático de vídeo-charlas educativas en UPV[Media]}, author = {Pérez González de Martos, Alejandro AND Giménez Pastor, Adrià AND Jorge Cano, Javier AND Javier Iranzo-Sánchez AND Joan Albert Silvestre-Cerdà AND Garcés Díaz-Munío, Gonçal V. AND Pau Baquero-Arnal AND Sanchis Navarro, Alberto AND Civera Sáiz, Jorge AND Juan Ciscar, Alfons AND Turró Ribalta, Carlos}, doi = {10.4995/INRED2022.2022.15844}, year = {2022}, date = {2022-01-01}, booktitle = {Proc. of VIII Congrés d'Innovació Educativa i Docència en Xarxa (IN-RED 2022)}, pages = {557--570}, address = {València (Spain)}, abstract = {More and more universities are banking on the production of digital content to support online or blended learning in higher education. Over the last years, the MLLP research group has been working closely with the UPV's ASIC media services in order to enrich educational multimedia resources through the application of natural language processing technologies including automatic speech recognition, machine translation and text-to-speech. In this work, we present the steps that are being followed for the comprehensive translation of these materials, specifically through (semi-)automatic dubbing by making use of state-of-the-art speaker-adaptive text-to-speech technologies.}, keywords = {automatic dubbing, Automatic Speech Recognition, Machine Translation, OER, text-to-speech}, pubstate = {published}, tppubtype = {inproceedings} } More and more universities are banking on the production of digital content to support online or blended learning in higher education. Over the last years, the MLLP research group has been working closely with the UPV's ASIC media services in order to enrich educational multimedia resources through the application of natural language processing technologies including automatic speech recognition, machine translation and text-to-speech. In this work, we present the steps that are being followed for the comprehensive translation of these materials, specifically through (semi-)automatic dubbing by making use of state-of-the-art speaker-adaptive text-to-speech technologies. |
2021 |
Iranzo-Sánchez, Javier; Jorge, Javier; Baquero-Arnal, Pau; Silvestre-Cerdà, Joan Albert ; Giménez, Adrià; Civera, Jorge; Sanchis, Albert; Juan, Alfons Streaming cascade-based speech translation leveraged by a direct segmentation model Journal Article Neural Networks, 142 , pp. 303–315, 2021. Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Cascade System, Deep Neural Networks, Hybrid System, Machine Translation, Segmentation Model, Speech Translation, streaming @article{Iranzo-Sánchez2021, title = {Streaming cascade-based speech translation leveraged by a direct segmentation model}, author = {Javier Iranzo-Sánchez and Javier Jorge and Pau Baquero-Arnal and Silvestre-Cerdà, Joan Albert and Adrià Giménez and Jorge Civera and Albert Sanchis and Alfons Juan}, doi = {10.1016/j.neunet.2021.05.013}, year = {2021}, date = {2021-01-01}, journal = {Neural Networks}, volume = {142}, pages = {303--315}, abstract = {The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. Nowadays, state-of-the-art ST systems are populated with deep neural networks that are conceived to work in an offline setup in which the audio input to be translated is fully available in advance. However, a streaming setup defines a completely different picture, in which an unbounded audio input gradually becomes available and at the same time the translation needs to be generated under real-time constraints. In this work, we present a state-of-the-art streaming ST system in which neural-based models integrated in the ASR and MT components are carefully adapted in terms of their training and decoding procedures in order to run under a streaming setup. In addition, a direct segmentation model that adapts the continuous ASR output to the capacity of simultaneous MT systems trained at the sentence level is introduced to guarantee low latency while preserving the translation quality of the complete ST system. The resulting ST system is thoroughly evaluated on the real-life streaming Europarl-ST benchmark to gauge the trade-off between quality and latency for each component individually as well as for the complete ST system.}, keywords = {Automatic Speech Recognition, Cascade System, Deep Neural Networks, Hybrid System, Machine Translation, Segmentation Model, Speech Translation, streaming}, pubstate = {published}, tppubtype = {article} } The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. Nowadays, state-of-the-art ST systems are populated with deep neural networks that are conceived to work in an offline setup in which the audio input to be translated is fully available in advance. However, a streaming setup defines a completely different picture, in which an unbounded audio input gradually becomes available and at the same time the translation needs to be generated under real-time constraints. In this work, we present a state-of-the-art streaming ST system in which neural-based models integrated in the ASR and MT components are carefully adapted in terms of their training and decoding procedures in order to run under a streaming setup. In addition, a direct segmentation model that adapts the continuous ASR output to the capacity of simultaneous MT systems trained at the sentence level is introduced to guarantee low latency while preserving the translation quality of the complete ST system. The resulting ST system is thoroughly evaluated on the real-life streaming Europarl-ST benchmark to gauge the trade-off between quality and latency for each component individually as well as for the complete ST system. |
2020 |
Iranzo-Sánchez, Javier; Silvestre-Cerdà, Joan Albert; Jorge, Javier; Roselló, Nahuel; Giménez, Adrià; Sanchis, Albert; Civera, Jorge; Juan, Alfons Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates Inproceedings Proc. of 45th Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020), pp. 8229–8233, Barcelona (Spain), 2020. Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Machine Translation, Multilingual Corpus, Speech Translation, Spoken Language Translation @inproceedings{Iranzo2020, title = {Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates}, author = {Javier Iranzo-Sánchez and Joan Albert Silvestre-Cerdà and Javier Jorge and Nahuel Roselló and Adrià Giménez and Albert Sanchis and Jorge Civera and Alfons Juan}, url = {https://arxiv.org/abs/1911.03167 https://paperswithcode.com/paper/europarl-st-a-multilingual-corpus-for-speech https://www.mllp.upv.es/europarl-st/}, doi = {10.1109/ICASSP40776.2020.9054626}, year = {2020}, date = {2020-01-01}, booktitle = {Proc. of 45th Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020)}, pages = {8229--8233}, address = {Barcelona (Spain)}, abstract = {Current research into spoken language translation (SLT), or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the de-bates held in the European Parliament in the period between2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition,machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable.}, keywords = {Automatic Speech Recognition, Machine Translation, Multilingual Corpus, Speech Translation, Spoken Language Translation}, pubstate = {published}, tppubtype = {inproceedings} } Current research into spoken language translation (SLT), or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the de-bates held in the European Parliament in the period between2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition,machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable. |
2019 |
Iranzo-Sánchez, Javier ; Garcés Díaz-Munío, Gonçal V; Civera, Jorge ; Juan, Alfons The MLLP-UPV Supervised Machine Translation Systems for WMT19 News Translation Task Inproceedings Proc. of Fourth Conference on Machine Translation (WMT19), pp. 218-224, Florence (Italy), 2019. Abstract | Links | BibTeX | Tags: Machine Translation, Neural Machine Translation, WMT19 News Translation @inproceedings{Iranzo-Sánchez2019, title = {The MLLP-UPV Supervised Machine Translation Systems for WMT19 News Translation Task}, author = {Iranzo-Sánchez, Javier and Garcés Díaz-Munío, Gonçal V. and Civera, Jorge and Juan, Alfons}, url = {https://www.mllp.upv.es/wp-content/uploads/2019/09/poster-1.pdf}, doi = {10.18653/v1/W19-5320}, year = {2019}, date = {2019-01-01}, booktitle = {Proc. of Fourth Conference on Machine Translation (WMT19)}, pages = {218-224}, address = {Florence (Italy)}, abstract = {[EN] This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 News Translation Shared Task. In this edition, we have submitted systems for the German ↔ English and German ↔ French language pairs, participating in both directions of each pair. Our submitted systems, based on the Transformer architecture, make ample use of data filtering, synthetic data and domain adaptation through fine-tuning. [CA] "Els sistemes de traducció automàtica supervisada de l'MLLP-UPV per a la tasca de traducció de notícies de WMT19": En aquest article descrivim la participació del grup de recerca MLLP de la Universitat Politècnica de València en la competició de traducció de notícies de WMT 2019. En aquesta edició, hem presentat sistemes per a les combinacions de traducció alemany ↔ anglés i alemany ↔ francés (en ambdós sentits). Els sistemes presentats, basats en l'arquitectura Transformer, fan un ús extens del filtratge de dades, les dades sintètiques i l'ajust fi amb adaptació al domini.}, keywords = {Machine Translation, Neural Machine Translation, WMT19 News Translation}, pubstate = {published}, tppubtype = {inproceedings} } [EN] This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 News Translation Shared Task. In this edition, we have submitted systems for the German ↔ English and German ↔ French language pairs, participating in both directions of each pair. Our submitted systems, based on the Transformer architecture, make ample use of data filtering, synthetic data and domain adaptation through fine-tuning. [CA] "Els sistemes de traducció automàtica supervisada de l'MLLP-UPV per a la tasca de traducció de notícies de WMT19": En aquest article descrivim la participació del grup de recerca MLLP de la Universitat Politècnica de València en la competició de traducció de notícies de WMT 2019. En aquesta edició, hem presentat sistemes per a les combinacions de traducció alemany ↔ anglés i alemany ↔ francés (en ambdós sentits). Els sistemes presentats, basats en l'arquitectura Transformer, fan un ús extens del filtratge de dades, les dades sintètiques i l'ajust fi amb adaptació al domini. |
Baquero-Arnal, Pau ; Iranzo-Sánchez, Javier ; Civera, Jorge ; Juan, Alfons The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task Inproceedings Proc. of Fourth Conference on Machine Translation (WMT19), pp. 179-184, Florence (Italy), 2019. Abstract | Links | BibTeX | Tags: Machine Translation, Neural Machine Translation, WMT19 @inproceedings{Baquero-Arnal2019, title = {The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task}, author = {Baquero-Arnal, Pau and Iranzo-Sánchez, Javier and Civera, Jorge and Juan, Alfons}, url = {https://www.aclweb.org/anthology/W19-5423/ https://www.mllp.upv.es/wp-content/uploads/2019/09/poster-2.pdf}, year = {2019}, date = {2019-01-01}, booktitle = {Proc. of Fourth Conference on Machine Translation (WMT19)}, pages = {179-184}, address = {Florence (Italy)}, abstract = {This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 Similar Language Translation Shared Task. We have submitted systems for the Portuguese ↔ Spanish language pair, in both directions. They are based on the Transformer architecture, as well as on a novel architecture called 2D alternating RNN. Both systems have been domain adapted through fine-tuning which has been shown to be very effective.}, keywords = {Machine Translation, Neural Machine Translation, WMT19}, pubstate = {published}, tppubtype = {inproceedings} } This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 Similar Language Translation Shared Task. We have submitted systems for the Portuguese ↔ Spanish language pair, in both directions. They are based on the Transformer architecture, as well as on a novel architecture called 2D alternating RNN. Both systems have been domain adapted through fine-tuning which has been shown to be very effective. |
2018 |
Matusov, Evgeny; Wilken, Patrick; Bahar, Parnia; Schamper, Julian; Golik, Pavel; Zeyer, Albert; Silvestre-Cerdà, Joan Albert; Martínez-Villaronga, Adrià; Pesch, Hendrick; Peter, Jan-Thorsten Neural Speech Translation at AppTek Inproceedings Proc. of 15th Intl. Workshop on Spoken Language Translation (IWSLT 2018), pp. 104–111, Hong Kong, 2018. Links | BibTeX | Tags: Automatic Speech Recognition, Machine Translation @inproceedings{Matusov18, title = {Neural Speech Translation at AppTek}, author = {Evgeny Matusov AND Patrick Wilken AND Parnia Bahar AND Julian Schamper AND Pavel Golik AND Albert Zeyer AND Joan Albert Silvestre-Cerdà AND Adrià Martínez-Villaronga AND Hendrick Pesch AND Jan-Thorsten Peter}, url = {https://www.mllp.upv.es/wp-content/uploads/2019/07/iwslt18.pdf https://workshop2018.iwslt.org/downloads/Proceedings_IWSLT_2018.pdf}, year = {2018}, date = {2018-07-01}, booktitle = {Proc. of 15th Intl. Workshop on Spoken Language Translation (IWSLT 2018)}, pages = {104--111}, address = {Hong Kong}, keywords = {Automatic Speech Recognition, Machine Translation}, pubstate = {published}, tppubtype = {inproceedings} } |
Valor Miró, Juan Daniel ; Baquero-Arnal, Pau; Civera, Jorge; Turró, Carlos; Juan, Alfons Multilingual videos for MOOCs and OER Journal Article Journal of Educational Technology & Society, 21 (2), pp. 1–12, 2018. Abstract | Links | BibTeX | Tags: Machine Translation, MOOCs, multilingual, Speech Recognition, video lecture repositories @article{Miró2018, title = {Multilingual videos for MOOCs and OER}, author = {Valor Miró, Juan Daniel and Pau Baquero-Arnal and Jorge Civera and Carlos Turró and Alfons Juan}, url = {https://www.mllp.upv.es/wp-content/uploads/2019/11/JETS2018MLLP.pdf http://hdl.handle.net/10251/122577 https://www.jstor.org/stable/26388375 https://www.j-ets.net/collection/published-issues/21_2}, year = {2018}, date = {2018-01-01}, journal = {Journal of Educational Technology & Society}, volume = {21}, number = {2}, pages = {1--12}, abstract = {Massive Open Online Courses (MOOCs) and Open Educational Resources (OER) are rapidly growing, but are not usually offered in multiple languages due to the lack of cost-effective solutions to translate the different objects comprising them and particularly videos. However, current state-of-the-art automatic speech recognition (ASR) and machine translation (MT) techniques have reached a level of maturity which opens the possibility of producing multilingual video subtitles of publishable quality at low cost. This work summarizes authors' experience in exploring this possibility in two real-life case studies: a MOOC platform and a large video lecture repository. Apart from describing the systems, tools and integration components employed for such purpose, a comprehensive evaluation of the results achieved is provided in terms of quality and efficiency. More precisely, it is shown that draft multilingual subtitles produced by domain-adapted ASR/MT systems reach a level of accuracy that make them worth post-editing, instead of generating them ex novo, saving approximately 25%–75% of the time. Finally, the results reported on user multilingual data consumption reflect that multilingual subtitles have had a very positive impact in our case studies boosting student enrolment, in the case of the MOOC platform, by 70% relative.}, keywords = {Machine Translation, MOOCs, multilingual, Speech Recognition, video lecture repositories}, pubstate = {published}, tppubtype = {article} } Massive Open Online Courses (MOOCs) and Open Educational Resources (OER) are rapidly growing, but are not usually offered in multiple languages due to the lack of cost-effective solutions to translate the different objects comprising them and particularly videos. However, current state-of-the-art automatic speech recognition (ASR) and machine translation (MT) techniques have reached a level of maturity which opens the possibility of producing multilingual video subtitles of publishable quality at low cost. This work summarizes authors' experience in exploring this possibility in two real-life case studies: a MOOC platform and a large video lecture repository. Apart from describing the systems, tools and integration components employed for such purpose, a comprehensive evaluation of the results achieved is provided in terms of quality and efficiency. More precisely, it is shown that draft multilingual subtitles produced by domain-adapted ASR/MT systems reach a level of accuracy that make them worth post-editing, instead of generating them ex novo, saving approximately 25%–75% of the time. Finally, the results reported on user multilingual data consumption reflect that multilingual subtitles have had a very positive impact in our case studies boosting student enrolment, in the case of the MOOC platform, by 70% relative. |
Iranzo-Sánchez, Javier ; Baquero-Arnal, Pau ; Garcés Díaz-Munío, Gonçal V; Martínez-Villaronga, Adrià ; Civera, Jorge ; Juan, Alfons The MLLP-UPV German-English Machine Translation System for WMT18 Inproceedings Proc. of the Third Conference on Machine Translation (WMT18), Volume 2: Shared Task Papers, pp. 422–428, Brussels (Belgium), 2018. Abstract | Links | BibTeX | Tags: Data Selection, Machine Translation, Neural Machine Translation, WMT18 news translation @inproceedings{Iranzo-Sánchez2018, title = {The MLLP-UPV German-English Machine Translation System for WMT18}, author = {Iranzo-Sánchez, Javier and Baquero-Arnal, Pau and Garcés Díaz-Munío, Gonçal V. and Martínez-Villaronga, Adrià and Civera, Jorge and Juan, Alfons}, url = {http://dx.doi.org/10.18653/v1/W18-6414 https://www.mllp.upv.es/wp-content/uploads/2018/11/wmt18_mllp-upv_poster.pdf}, year = {2018}, date = {2018-01-01}, booktitle = {Proc. of the Third Conference on Machine Translation (WMT18), Volume 2: Shared Task Papers}, pages = {422--428}, address = {Brussels (Belgium)}, abstract = {[EN] This paper describes the statistical machine translation system built by the MLLP research group of Universitat Politècnica de València for the German>English news translation shared task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We used an ensemble of Transformer architecture–based neural machine translation systems. To train our system under "constrained" conditions, we filtered the provided parallel data with a scoring technique using character-based language models, and we added parallel data based on synthetic source sentences generated from the provided monolingual corpora. [CA] "El sistema de traducció automàtica alemany>anglés de l'MLLP-UPV per a WMT18": En aquest article descrivim el sistema de traducció automàtica estadística creat pel grup d'investigació MLLP de la Universitat Politècnica de València per a la competició de traducció de notícies alemany>anglés de la Third Conference on Machine Translation (WMT18, associada a la conferència EMNLP 2018). Hem utilitzat una combinació de sistemes de traducció automàtica neuronal basats en l'arquitectura Transformer. Per a entrenar el nostre sistema en la categoria "fitada" (només amb els corpus lingüístics oficials de la competició), hem filtrat les dades paral·leles disponibles amb una tècnica que assigna puntuacions utilitzant models de llenguatge de caràcters, i hem afegit dades paral·leles basades en frases d'origen sintètiques generades a partir dels corpus monolingües disponibles.}, keywords = {Data Selection, Machine Translation, Neural Machine Translation, WMT18 news translation}, pubstate = {published}, tppubtype = {inproceedings} } [EN] This paper describes the statistical machine translation system built by the MLLP research group of Universitat Politècnica de València for the German>English news translation shared task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We used an ensemble of Transformer architecture–based neural machine translation systems. To train our system under "constrained" conditions, we filtered the provided parallel data with a scoring technique using character-based language models, and we added parallel data based on synthetic source sentences generated from the provided monolingual corpora. [CA] "El sistema de traducció automàtica alemany>anglés de l'MLLP-UPV per a WMT18": En aquest article descrivim el sistema de traducció automàtica estadística creat pel grup d'investigació MLLP de la Universitat Politècnica de València per a la competició de traducció de notícies alemany>anglés de la Third Conference on Machine Translation (WMT18, associada a la conferència EMNLP 2018). Hem utilitzat una combinació de sistemes de traducció automàtica neuronal basats en l'arquitectura Transformer. Per a entrenar el nostre sistema en la categoria "fitada" (només amb els corpus lingüístics oficials de la competició), hem filtrat les dades paral·leles disponibles amb una tècnica que assigna puntuacions utilitzant models de llenguatge de caràcters, i hem afegit dades paral·leles basades en frases d'origen sintètiques generades a partir dels corpus monolingües disponibles. |
2016 |
Silvestre-Cerdà, Joan Albert; Juan, Alfons; Civera, Jorge Different Contributions to Cost-Effective Transcription and Translation of Video Lectures Inproceedings Proc. of IX Jornadas en Tecnología del Habla and V Iberian SLTech Workshop (IberSpeech 2016), pp. 313-319, Lisbon (Portugal), 2016, ISBN: 978-3-319-49168-4 . Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Automatic transcription and translation, Machine Translation, Video Lectures @inproceedings{Silvestre-Cerdà2016b, title = {Different Contributions to Cost-Effective Transcription and Translation of Video Lectures}, author = {Joan Albert Silvestre-Cerdà and Alfons Juan and Jorge Civera}, url = {http://www.mllp.upv.es/wp-content/uploads/2016/11/poster.pdf http://www.mllp.upv.es/wp-content/uploads/2016/11/paper.pdf http://hdl.handle.net/10251/62194}, isbn = {978-3-319-49168-4 }, year = {2016}, date = {2016-11-24}, booktitle = {Proc. of IX Jornadas en Tecnología del Habla and V Iberian SLTech Workshop (IberSpeech 2016)}, pages = {313-319}, address = {Lisbon (Portugal)}, abstract = {In recent years, on-line multimedia repositories have experiencied a strong growth that have made them consolidated as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that gives accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities, among many other benefits and applications. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate highquality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main outcome derived from this multidisciplinary thesis, The transLectures-UPV Platform, has been publicly released as an open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in many Spanish and European universities and institutions.}, keywords = {Automatic Speech Recognition, Automatic transcription and translation, Machine Translation, Video Lectures}, pubstate = {published}, tppubtype = {inproceedings} } In recent years, on-line multimedia repositories have experiencied a strong growth that have made them consolidated as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that gives accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities, among many other benefits and applications. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate highquality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main outcome derived from this multidisciplinary thesis, The transLectures-UPV Platform, has been publicly released as an open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in many Spanish and European universities and institutions. |
Silvestre-Cerdà, Joan Albert Different Contributions to Cost-Effective Transcription and Translation of Video Lectures PhD Thesis Universitat Politècnica de València, 2016, (Advisors: Alfons Juan Ciscar and Jorge Civera Saiz). Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Education, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, video lecture repositories, Video Lectures @phdthesis{Silvestre-Cerdà2016, title = {Different Contributions to Cost-Effective Transcription and Translation of Video Lectures}, author = {Joan Albert Silvestre-Cerdà}, url = {http://hdl.handle.net/10251/62194 http://www.mllp.upv.es/wp-content/uploads/2016/01/slides.pdf http://www.mllp.upv.es/wp-content/uploads/2016/01/thesis.pdf http://www.mllp.upv.es/phd-thesis-different-contributions-to-cost-effective-transcription-and-translation-of-video-lectures-by-joan-albert-silvestre-cerda-abstract/}, year = {2016}, date = {2016-01-27}, school = {Universitat Politècnica de València}, abstract = {In recent years, online multimedia repositories have experienced a strong growth that has consolidated them as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that provides accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main technological outcome derived from this thesis, the transLectures-UPV Platform (TLP), has been publicly released as open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in Spanish and European universities and institutions.}, note = {Advisors: Alfons Juan Ciscar and Jorge Civera Saiz}, keywords = {Automatic Speech Recognition, Education, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, video lecture repositories, Video Lectures}, pubstate = {published}, tppubtype = {phdthesis} } In recent years, online multimedia repositories have experienced a strong growth that has consolidated them as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that provides accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main technological outcome derived from this thesis, the transLectures-UPV Platform (TLP), has been publicly released as open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in Spanish and European universities and institutions. |
2015 |
Pérez González de Martos, Alejandro ; Silvestre-Cerdà, Joan Albert ; Valor Miró, Juan Daniel ; Civera, Jorge ; Juan, Alfons MLLP Transcription and Translation Platform Miscellaneous 2015, (Short paper for demo presentation accepted at 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), Toledo (Spain), 2015.). Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Docencia en Red, Document translation, Efficient video subtitling, Machine Translation, MLLP, Post-editing, Video Lectures @misc{mllpttp, title = {MLLP Transcription and Translation Platform}, author = {Pérez González de Martos, Alejandro and Silvestre-Cerdà, Joan Albert and Valor Miró, Juan Daniel and Civera, Jorge and Juan, Alfons}, url = {http://hdl.handle.net/10251/65747 http://www.mllp.upv.es/wp-content/uploads/2015/09/ttp_platform_demo_ectel2015.pdf http://ectel2015.httc.de/index.php?id=722}, year = {2015}, date = {2015-09-16}, booktitle = {Tenth European Conference On Technology Enhanced Learning (EC-TEL 2015)}, abstract = {This paper briefly presents the main features of MLLP’s Transcription and Translation Platform, which uses state-of-the-art automatic speech recognition and machine translation systems to generate multilingual subtitles of educational audiovisual and textual content. It has proven to reduce user effort up to 1/3 of the time needed to generate transcriptions and translations from scratch.}, note = {Short paper for demo presentation accepted at 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), Toledo (Spain), 2015.}, keywords = {Automatic Speech Recognition, Docencia en Red, Document translation, Efficient video subtitling, Machine Translation, MLLP, Post-editing, Video Lectures}, pubstate = {published}, tppubtype = {misc} } This paper briefly presents the main features of MLLP’s Transcription and Translation Platform, which uses state-of-the-art automatic speech recognition and machine translation systems to generate multilingual subtitles of educational audiovisual and textual content. It has proven to reduce user effort up to 1/3 of the time needed to generate transcriptions and translations from scratch. |
2013 |
Silvestre-Cerdà, Joan Albert; Pérez, Alejandro; Jiménez, Manuel; Turró, Carlos; Juan, Alfons; Civera, Jorge A System Architecture to Support Cost-Effective Transcription and Translation of Large Video Lecture Repositories Inproceedings Proc. of the IEEE Intl. Conf. on Systems, Man, and Cybernetics SMC 2013 , pp. 3994-3999, Manchester (UK), 2013. Abstract | Links | BibTeX | Tags: Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures @inproceedings{Silvestre-Cerdà2013, title = {A System Architecture to Support Cost-Effective Transcription and Translation of Large Video Lecture Repositories}, author = {Joan Albert Silvestre-Cerdà and Alejandro Pérez and Manuel Jiménez and Carlos Turró and Alfons Juan and Jorge Civera}, url = {http://dx.doi.org/10.1109/SMC.2013.682}, year = {2013}, date = {2013-01-01}, booktitle = {Proc. of the IEEE Intl. Conf. on Systems, Man, and Cybernetics SMC 2013 }, pages = {3994-3999}, address = {Manchester (UK)}, abstract = {Online video lecture repositories are rapidly growing and becoming established as fundamental knowledge assets. However, most lectures are neither transcribed nor translated because of the lack of cost-effective solutions that can give accurate enough results. In this paper, we describe a system architecture that supports the cost-effective transcription and translation of large video lecture repositories. This architecture has been adopted in the EU project transLectures and is now being tested on a repository of more than 9000 video lectures at the Universitat Politecnica de Valencia. Following a brief description of this repository and of the transLectures project, we describe the proposed system architecture in detail. We also report empirical results on the quality of the transcriptions and translations currently being maintained and steadily improved.}, keywords = {Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures}, pubstate = {published}, tppubtype = {inproceedings} } Online video lecture repositories are rapidly growing and becoming established as fundamental knowledge assets. However, most lectures are neither transcribed nor translated because of the lack of cost-effective solutions that can give accurate enough results. In this paper, we describe a system architecture that supports the cost-effective transcription and translation of large video lecture repositories. This architecture has been adopted in the EU project transLectures and is now being tested on a repository of more than 9000 video lectures at the Universitat Politecnica de Valencia. Following a brief description of this repository and of the transLectures project, we describe the proposed system architecture in detail. We also report empirical results on the quality of the transcriptions and translations currently being maintained and steadily improved. |
2012 |
Silvestre-Cerdà, Joan Albert ; Del Agua, Miguel ; Garcés, Gonçal; Gascó, Guillem; Giménez-Pastor, Adrià; Martínez, Adrià; Pérez González de Martos, Alejandro ; Sánchez, Isaías; Serrano Martínez-Santos, Nicolás ; Spencer, Rachel; Valor Miró, Juan Daniel ; Andrés-Ferrer, Jesús; Civera, Jorge; Sanchís, Alberto; Juan, Alfons transLectures Inproceedings Proceedings (Online) of IberSPEECH 2012, pp. 345–351, Madrid (Spain), 2012. Abstract | Links | BibTeX | Tags: Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures @inproceedings{Silvestre-Cerdà2012b, title = {transLectures}, author = {Silvestre-Cerdà, Joan Albert and Del Agua, Miguel and Gonçal Garcés and Guillem Gascó and Adrià Giménez-Pastor and Adrià Martínez and Pérez González de Martos, Alejandro and Isaías Sánchez and Serrano Martínez-Santos, Nicolás and Rachel Spencer and Valor Miró, Juan Daniel and Jesús Andrés-Ferrer and Jorge Civera and Alberto Sanchís and Alfons Juan}, url = {http://hdl.handle.net/10251/37290 http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VII/IberSPEECH2012_OnlineProceedings.pdf https://web.archive.org/web/20130609073144/http://iberspeech2012.ii.uam.es/IberSPEECH2012_OnlineProceedings.pdf http://www.mllp.upv.es/wp-content/uploads/2015/04/1209IberSpeech.pdf}, year = {2012}, date = {2012-11-22}, booktitle = {Proceedings (Online) of IberSPEECH 2012}, pages = {345--351}, address = {Madrid (Spain)}, abstract = {[EN] transLectures (Transcription and Translation of Video Lectures) is an EU STREP project in which advanced automatic speech recognition and machine translation techniques are being tested on large video lecture repositories. The project began in November 2011 and will run for three years. This paper will outline the project's main motivation and objectives, and give a brief description of the two main repositories being considered: VideoLectures.NET and poliMèdia. The first results obtained by the UPV group for the poliMedia repository will also be provided. [CA] transLectures (Transcription and Translation of Video Lectures) és un projecte del 7PM de la Unió Europea en el qual s'estan posant a prova tècniques avançades de reconeixement automàtic de la parla i de traducció automàtica sobre grans repositoris digitals de vídeos docents. El projecte començà al novembre de 2011 i tindrà una duració de tres anys. En aquest article exposem la motivació i els objectius del projecte, i descrivim breument els dos repositoris principals sobre els quals es treballa: VideoLectures.NET i poliMèdia. També oferim els primers resultats obtinguts per l'equip de la UPV al repositori poliMèdia.}, keywords = {Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures}, pubstate = {published}, tppubtype = {inproceedings} } [EN] transLectures (Transcription and Translation of Video Lectures) is an EU STREP project in which advanced automatic speech recognition and machine translation techniques are being tested on large video lecture repositories. The project began in November 2011 and will run for three years. This paper will outline the project's main motivation and objectives, and give a brief description of the two main repositories being considered: VideoLectures.NET and poliMèdia. The first results obtained by the UPV group for the poliMedia repository will also be provided. [CA] transLectures (Transcription and Translation of Video Lectures) és un projecte del 7PM de la Unió Europea en el qual s'estan posant a prova tècniques avançades de reconeixement automàtic de la parla i de traducció automàtica sobre grans repositoris digitals de vídeos docents. El projecte començà al novembre de 2011 i tindrà una duració de tres anys. En aquest article exposem la motivació i els objectius del projecte, i descrivim breument els dos repositoris principals sobre els quals es treballa: VideoLectures.NET i poliMèdia. També oferim els primers resultats obtinguts per l'equip de la UPV al repositori poliMèdia. |
Publications
2024 |
Universitat Politècnica de València, 2024, (advisers: Alfons Juan Ciscar and Jorge Civera Saiz). |
2022 |
Doblaje automático de vídeo-charlas educativas en UPV[Media] Inproceedings Proc. of VIII Congrés d'Innovació Educativa i Docència en Xarxa (IN-RED 2022), pp. 557–570, València (Spain), 2022. |
2021 |
Streaming cascade-based speech translation leveraged by a direct segmentation model Journal Article Neural Networks, 142 , pp. 303–315, 2021. |
2020 |
Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates Inproceedings Proc. of 45th Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020), pp. 8229–8233, Barcelona (Spain), 2020. |
2019 |
The MLLP-UPV Supervised Machine Translation Systems for WMT19 News Translation Task Inproceedings Proc. of Fourth Conference on Machine Translation (WMT19), pp. 218-224, Florence (Italy), 2019. |
The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task Inproceedings Proc. of Fourth Conference on Machine Translation (WMT19), pp. 179-184, Florence (Italy), 2019. |
2018 |
Neural Speech Translation at AppTek Inproceedings Proc. of 15th Intl. Workshop on Spoken Language Translation (IWSLT 2018), pp. 104–111, Hong Kong, 2018. |
Multilingual videos for MOOCs and OER Journal Article Journal of Educational Technology & Society, 21 (2), pp. 1–12, 2018. |
The MLLP-UPV German-English Machine Translation System for WMT18 Inproceedings Proc. of the Third Conference on Machine Translation (WMT18), Volume 2: Shared Task Papers, pp. 422–428, Brussels (Belgium), 2018. |
2016 |
Different Contributions to Cost-Effective Transcription and Translation of Video Lectures Inproceedings Proc. of IX Jornadas en Tecnología del Habla and V Iberian SLTech Workshop (IberSpeech 2016), pp. 313-319, Lisbon (Portugal), 2016, ISBN: 978-3-319-49168-4 . |
Different Contributions to Cost-Effective Transcription and Translation of Video Lectures PhD Thesis Universitat Politècnica de València, 2016, (Advisors: Alfons Juan Ciscar and Jorge Civera Saiz). |
2015 |
MLLP Transcription and Translation Platform Miscellaneous 2015, (Short paper for demo presentation accepted at 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), Toledo (Spain), 2015.). |
2013 |
A System Architecture to Support Cost-Effective Transcription and Translation of Large Video Lecture Repositories Inproceedings Proc. of the IEEE Intl. Conf. on Systems, Man, and Cybernetics SMC 2013 , pp. 3994-3999, Manchester (UK), 2013. |
2012 |
transLectures Inproceedings Proceedings (Online) of IberSPEECH 2012, pp. 345–351, Madrid (Spain), 2012. |