Publications

Pérez González de Martos, Alejandro ; Giménez Pastor, Adrià ; Jorge Cano, Javier ; Iranzo-Sánchez, Javier; Silvestre-Cerdà, Joan Albert; Garcés Díaz-Munío, Gonçal V; Baquero-Arnal, Pau; Sanchis Navarro, Alberto ; Civera Sáiz, Jorge ; Juan Ciscar, Alfons ; Turró Ribalta, Carlos

Doblaje automático de vídeo-charlas educativas en UPV[Media] Inproceedings

Proc. of VIII Congrés d'Innovació Educativa i Docència en Xarxa (IN-RED 2022), pp. 557–570, València (Spain), 2022.

Abstract | Links | BibTeX | Tags: automatic dubbing, Automatic Speech Recognition, Machine Translation, OER, text-to-speech

Iranzo-Sánchez, Javier; Jorge, Javier; Baquero-Arnal, Pau; Silvestre-Cerdà, Joan Albert ; Giménez, Adrià; Civera, Jorge; Sanchis, Albert; Juan, Alfons

Streaming cascade-based speech translation leveraged by a direct segmentation model Journal Article

Neural Networks, 142 , pp. 303–315, 2021.

Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Cascade System, Deep Neural Networks, Hybrid System, Machine Translation, Segmentation Model, Speech Translation, streaming

@article{Iranzo-Sánchez2021,
title = {Streaming cascade-based speech translation leveraged by a direct segmentation model},
author = {Javier Iranzo-Sánchez and Javier Jorge and Pau Baquero-Arnal and Silvestre-Cerdà, Joan Albert and Adrià Giménez and Jorge Civera and Albert Sanchis and Alfons Juan},
doi = {10.1016/j.neunet.2021.05.013},
year = {2021},
date = {2021-01-01},
journal = {Neural Networks},
volume = {142},
pages = {303--315},
abstract = {The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. Nowadays, state-of-the-art ST systems are populated with deep neural networks that are conceived to work in an offline setup in which the audio input to be translated is fully available in advance. However, a streaming setup defines a completely different picture, in which an unbounded audio input gradually becomes available and at the same time the translation needs to be generated under real-time constraints. In this work, we present a state-of-the-art streaming ST system in which neural-based models integrated in the ASR and MT components are carefully adapted in terms of their training and decoding procedures in order to run under a streaming setup. In addition, a direct segmentation model that adapts the continuous ASR output to the capacity of simultaneous MT systems trained at the sentence level is introduced to guarantee low latency while preserving the translation quality of the complete ST system. The resulting ST system is thoroughly evaluated on the real-life streaming Europarl-ST benchmark to gauge the trade-off between quality and latency for each component individually as well as for the complete ST system.},
keywords = {Automatic Speech Recognition, Cascade System, Deep Neural Networks, Hybrid System, Machine Translation, Segmentation Model, Speech Translation, streaming},
pubstate = {published},
tppubtype = {article}
}

Close

Iranzo-Sánchez, Javier; Silvestre-Cerdà, Joan Albert; Jorge, Javier; Roselló, Nahuel; Giménez, Adrià; Sanchis, Albert; Civera, Jorge; Juan, Alfons

Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates Inproceedings

Proc. of 45th Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020), pp. 8229–8233, Barcelona (Spain), 2020.

Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Machine Translation, Multilingual Corpus, Speech Translation, Spoken Language Translation

Matusov, Evgeny; Wilken, Patrick; Bahar, Parnia; Schamper, Julian; Golik, Pavel; Zeyer, Albert; Silvestre-Cerdà, Joan Albert; Martínez-Villaronga, Adrià; Pesch, Hendrick; Peter, Jan-Thorsten

Neural Speech Translation at AppTek Inproceedings

Proc. of 15th Intl. Workshop on Spoken Language Translation (IWSLT 2018), pp. 104–111, Hong Kong, 2018.

Links | BibTeX | Tags: Automatic Speech Recognition, Machine Translation

Silvestre-Cerdà, Joan Albert; Juan, Alfons; Civera, Jorge

Different Contributions to Cost-Effective Transcription and Translation of Video Lectures Inproceedings

Proc. of IX Jornadas en Tecnología del Habla and V Iberian SLTech Workshop (IberSpeech 2016), pp. 313-319, Lisbon (Portugal), 2016, ISBN: 978-3-319-49168-4 .

Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Automatic transcription and translation, Machine Translation, Video Lectures

@inproceedings{Silvestre-Cerdà2016b,
title = {Different Contributions to Cost-Effective Transcription and Translation of Video Lectures},
author = {Joan Albert Silvestre-Cerdà and Alfons Juan and Jorge Civera},
url = {http://www.mllp.upv.es/wp-content/uploads/2016/11/poster.pdf
http://www.mllp.upv.es/wp-content/uploads/2016/11/paper.pdf
http://hdl.handle.net/10251/62194},
isbn = {978-3-319-49168-4 },
year = {2016},
date = {2016-11-24},
booktitle = {Proc. of IX Jornadas en Tecnología del Habla and V Iberian SLTech Workshop (IberSpeech 2016)},
pages = {313-319},
address = {Lisbon (Portugal)},
abstract = {In recent years, on-line multimedia repositories have experiencied
a strong growth that have made them consolidated as essential
knowledge assets, especially in the area of education, where large repositories
of video lectures have been built in order to complement or even
replace traditional teaching methods. However, most of these video lectures
are neither transcribed nor translated due to a lack of cost-effective
solutions to do so in a way that gives accurate enough results. Solutions
of this kind are clearly necessary in order to make these lectures accessible
to speakers of different languages and to people with hearing
disabilities, among many other benefits and applications.
For this reason, the main aim of this thesis is to develop a cost-effective
solution capable of transcribing and translating video lectures to a reasonable
degree of accuracy. More specifically, we address the integration
of state-of-the-art techniques in Automatic Speech Recognition and Machine
Translation into large video lecture repositories to generate highquality
multilingual video subtitles without human intervention and at
a reduced computational cost. Also, we explore the potential benefits of
the exploitation of the information that we know a priori about these
repositories, that is, lecture-specific knowledge such as speaker, topic
or slides, to create specialised, in-domain transcription and translation
systems by means of massive adaptation techniques.
The proposed solutions have been tested in real-life scenarios by carrying
out several objective and subjective evaluations, obtaining very
positive results. The main outcome derived from this multidisciplinary
thesis, The transLectures-UPV Platform, has been publicly released as an
open-source software, and, at the time of writing, it is serving automatic
transcriptions and translations for several thousands of video lectures in
many Spanish and European universities and institutions.},
keywords = {Automatic Speech Recognition, Automatic transcription and translation, Machine Translation, Video Lectures},
pubstate = {published},
tppubtype = {inproceedings}
}

Close

Silvestre-Cerdà, Joan Albert

Different Contributions to Cost-Effective Transcription and Translation of Video Lectures PhD Thesis

Universitat Politècnica de València, 2016, (Advisors: Alfons Juan Ciscar and Jorge Civera Saiz).

Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Education, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, video lecture repositories, Video Lectures

@phdthesis{Silvestre-Cerdà2016,
title = {Different Contributions to Cost-Effective Transcription and Translation of Video Lectures},
author = {Joan Albert Silvestre-Cerdà},
url = {http://hdl.handle.net/10251/62194
http://www.mllp.upv.es/wp-content/uploads/2016/01/slides.pdf
http://www.mllp.upv.es/wp-content/uploads/2016/01/thesis.pdf
http://www.mllp.upv.es/phd-thesis-different-contributions-to-cost-effective-transcription-and-translation-of-video-lectures-by-joan-albert-silvestre-cerda-abstract/},
year = {2016},
date = {2016-01-27},
school = {Universitat Politècnica de València},
abstract = {In recent years, online multimedia repositories have experienced a strong growth that has consolidated them as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that provides accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking.

For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques.

The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main technological outcome derived from this thesis, the transLectures-UPV Platform (TLP), has been publicly released as open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in Spanish and European universities and institutions.},
note = {Advisors: Alfons Juan Ciscar and Jorge Civera Saiz},
keywords = {Automatic Speech Recognition, Education, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, video lecture repositories, Video Lectures},
pubstate = {published},
tppubtype = {phdthesis}
}

Close

In recent years, online multimedia repositories have experienced a strong growth that has consolidated them as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that provides accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking.

For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques.

The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main technological outcome derived from this thesis, the transLectures-UPV Platform (TLP), has been publicly released as open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in Spanish and European universities and institutions.

Close

Pérez González de Martos, Alejandro ; Silvestre-Cerdà, Joan Albert ; Valor Miró, Juan Daniel ; Civera, Jorge ; Juan, Alfons

MLLP Transcription and Translation Platform Miscellaneous

2015, (Short paper for demo presentation accepted at 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), Toledo (Spain), 2015.).

Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Docencia en Red, Document translation, Efficient video subtitling, Machine Translation, MLLP, Post-editing, Video Lectures

Silvestre-Cerdà, Joan Albert; Pérez, Alejandro; Jiménez, Manuel; Turró, Carlos; Juan, Alfons; Civera, Jorge

A System Architecture to Support Cost-Effective Transcription and Translation of Large Video Lecture Repositories Inproceedings

Proc. of the IEEE Intl. Conf. on Systems, Man, and Cybernetics SMC 2013 , pp. 3994-3999, Manchester (UK), 2013.

Abstract | Links | BibTeX | Tags: Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures

Silvestre-Cerdà, Joan Albert ; Del Agua, Miguel ; Garcés, Gonçal; Gascó, Guillem; Giménez-Pastor, Adrià; Martínez, Adrià; Pérez González de Martos, Alejandro ; Sánchez, Isaías; Serrano Martínez-Santos, Nicolás ; Spencer, Rachel; Valor Miró, Juan Daniel ; Andrés-Ferrer, Jesús; Civera, Jorge; Sanchís, Alberto; Juan, Alfons

transLectures Inproceedings

Proceedings (Online) of IberSPEECH 2012, pp. 345–351, Madrid (Spain), 2012.

Abstract | Links | BibTeX | Tags: Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures

@inproceedings{Silvestre-Cerdà2012b,
title = {transLectures},
author = {Silvestre-Cerdà, Joan Albert and Del Agua, Miguel and Gonçal Garcés and Guillem Gascó and Adrià Giménez-Pastor and Adrià Martínez and Pérez González de Martos, Alejandro and Isaías Sánchez and Serrano Martínez-Santos, Nicolás and Rachel Spencer and Valor Miró, Juan Daniel and Jesús Andrés-Ferrer and Jorge Civera and Alberto Sanchís and Alfons Juan},
url = {http://hdl.handle.net/10251/37290
http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VII/IberSPEECH2012_OnlineProceedings.pdf
https://web.archive.org/web/20130609073144/http://iberspeech2012.ii.uam.es/IberSPEECH2012_OnlineProceedings.pdf
http://www.mllp.upv.es/wp-content/uploads/2015/04/1209IberSpeech.pdf},
year = {2012},
date = {2012-11-22},
booktitle = {Proceedings (Online) of IberSPEECH 2012},
pages = {345--351},
address = {Madrid (Spain)},
abstract = {[EN] transLectures (Transcription and Translation of Video Lectures) is an EU STREP project in which advanced automatic speech recognition and machine translation techniques are being tested on large video lecture repositories. The project began in November 2011 and will run for three years. This paper will outline the project's main motivation and objectives, and give a brief description of the two main repositories being considered: VideoLectures.NET and poliMèdia. The first results obtained by the UPV group for the poliMedia repository will also be provided.

[CA] transLectures (Transcription and Translation of Video Lectures) és un projecte del 7PM de la Unió Europea en el qual s'estan posant a prova tècniques avançades de reconeixement automàtic de la parla i de traducció automàtica sobre grans repositoris digitals de vídeos docents. El projecte començà al novembre de 2011 i tindrà una duració de tres anys. En aquest article exposem la motivació i els objectius del projecte, i descrivim breument els dos repositoris principals sobre els quals es treballa: VideoLectures.NET i poliMèdia. També oferim els primers resultats obtinguts per l'equip de la UPV al repositori poliMèdia.},
keywords = {Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures},
pubstate = {published},
tppubtype = {inproceedings}
}

Close