2022 |
Pérez González de Martos, Alejandro ; Giménez Pastor, Adrià ; Jorge Cano, Javier ; Iranzo-Sánchez, Javier; Silvestre-Cerdà, Joan Albert; Garcés Díaz-Munío, Gonçal V; Baquero-Arnal, Pau; Sanchis Navarro, Alberto ; Civera Sáiz, Jorge ; Juan Ciscar, Alfons ; Turró Ribalta, Carlos Doblaje automático de vídeo-charlas educativas en UPV[Media] Inproceedings Proc. of VIII Congrés d'Innovació Educativa i Docència en Xarxa (IN-RED 2022), pp. 557–570, València (Spain), 2022. Abstract | Links | BibTeX | Tags: automatic dubbing, Automatic Speech Recognition, Machine Translation, OER, text-to-speech @inproceedings{deMartos2022, title = {Doblaje automático de vídeo-charlas educativas en UPV[Media]}, author = {Pérez González de Martos, Alejandro AND Giménez Pastor, Adrià AND Jorge Cano, Javier AND Javier Iranzo-Sánchez AND Joan Albert Silvestre-Cerdà AND Garcés Díaz-Munío, Gonçal V. AND Pau Baquero-Arnal AND Sanchis Navarro, Alberto AND Civera Sáiz, Jorge AND Juan Ciscar, Alfons AND Turró Ribalta, Carlos}, doi = {10.4995/INRED2022.2022.15844}, year = {2022}, date = {2022-01-01}, booktitle = {Proc. of VIII Congrés d'Innovació Educativa i Docència en Xarxa (IN-RED 2022)}, pages = {557--570}, address = {València (Spain)}, abstract = {More and more universities are banking on the production of digital content to support online or blended learning in higher education. Over the last years, the MLLP research group has been working closely with the UPV's ASIC media services in order to enrich educational multimedia resources through the application of natural language processing technologies including automatic speech recognition, machine translation and text-to-speech. In this work, we present the steps that are being followed for the comprehensive translation of these materials, specifically through (semi-)automatic dubbing by making use of state-of-the-art speaker-adaptive text-to-speech technologies.}, keywords = {automatic dubbing, Automatic Speech Recognition, Machine Translation, OER, text-to-speech}, pubstate = {published}, tppubtype = {inproceedings} } More and more universities are banking on the production of digital content to support online or blended learning in higher education. Over the last years, the MLLP research group has been working closely with the UPV's ASIC media services in order to enrich educational multimedia resources through the application of natural language processing technologies including automatic speech recognition, machine translation and text-to-speech. In this work, we present the steps that are being followed for the comprehensive translation of these materials, specifically through (semi-)automatic dubbing by making use of state-of-the-art speaker-adaptive text-to-speech technologies. |
2021 |
Iranzo-Sánchez, Javier; Jorge, Javier; Baquero-Arnal, Pau; Silvestre-Cerdà, Joan Albert ; Giménez, Adrià; Civera, Jorge; Sanchis, Albert; Juan, Alfons Streaming cascade-based speech translation leveraged by a direct segmentation model Journal Article Neural Networks, 142 , pp. 303–315, 2021. Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Cascade System, Deep Neural Networks, Hybrid System, Machine Translation, Segmentation Model, Speech Translation, streaming @article{Iranzo-Sánchez2021, title = {Streaming cascade-based speech translation leveraged by a direct segmentation model}, author = {Javier Iranzo-Sánchez and Javier Jorge and Pau Baquero-Arnal and Silvestre-Cerdà, Joan Albert and Adrià Giménez and Jorge Civera and Albert Sanchis and Alfons Juan}, doi = {10.1016/j.neunet.2021.05.013}, year = {2021}, date = {2021-01-01}, journal = {Neural Networks}, volume = {142}, pages = {303--315}, abstract = {The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. Nowadays, state-of-the-art ST systems are populated with deep neural networks that are conceived to work in an offline setup in which the audio input to be translated is fully available in advance. However, a streaming setup defines a completely different picture, in which an unbounded audio input gradually becomes available and at the same time the translation needs to be generated under real-time constraints. In this work, we present a state-of-the-art streaming ST system in which neural-based models integrated in the ASR and MT components are carefully adapted in terms of their training and decoding procedures in order to run under a streaming setup. In addition, a direct segmentation model that adapts the continuous ASR output to the capacity of simultaneous MT systems trained at the sentence level is introduced to guarantee low latency while preserving the translation quality of the complete ST system. The resulting ST system is thoroughly evaluated on the real-life streaming Europarl-ST benchmark to gauge the trade-off between quality and latency for each component individually as well as for the complete ST system.}, keywords = {Automatic Speech Recognition, Cascade System, Deep Neural Networks, Hybrid System, Machine Translation, Segmentation Model, Speech Translation, streaming}, pubstate = {published}, tppubtype = {article} } The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. Nowadays, state-of-the-art ST systems are populated with deep neural networks that are conceived to work in an offline setup in which the audio input to be translated is fully available in advance. However, a streaming setup defines a completely different picture, in which an unbounded audio input gradually becomes available and at the same time the translation needs to be generated under real-time constraints. In this work, we present a state-of-the-art streaming ST system in which neural-based models integrated in the ASR and MT components are carefully adapted in terms of their training and decoding procedures in order to run under a streaming setup. In addition, a direct segmentation model that adapts the continuous ASR output to the capacity of simultaneous MT systems trained at the sentence level is introduced to guarantee low latency while preserving the translation quality of the complete ST system. The resulting ST system is thoroughly evaluated on the real-life streaming Europarl-ST benchmark to gauge the trade-off between quality and latency for each component individually as well as for the complete ST system. |
2020 |
Iranzo-Sánchez, Javier; Silvestre-Cerdà, Joan Albert; Jorge, Javier; Roselló, Nahuel; Giménez, Adrià; Sanchis, Albert; Civera, Jorge; Juan, Alfons Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates Inproceedings Proc. of 45th Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020), pp. 8229–8233, Barcelona (Spain), 2020. Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Machine Translation, Multilingual Corpus, Speech Translation, Spoken Language Translation @inproceedings{Iranzo2020, title = {Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates}, author = {Javier Iranzo-Sánchez and Joan Albert Silvestre-Cerdà and Javier Jorge and Nahuel Roselló and Adrià Giménez and Albert Sanchis and Jorge Civera and Alfons Juan}, url = {https://arxiv.org/abs/1911.03167 https://paperswithcode.com/paper/europarl-st-a-multilingual-corpus-for-speech https://www.mllp.upv.es/europarl-st/}, doi = {10.1109/ICASSP40776.2020.9054626}, year = {2020}, date = {2020-01-01}, booktitle = {Proc. of 45th Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020)}, pages = {8229--8233}, address = {Barcelona (Spain)}, abstract = {Current research into spoken language translation (SLT), or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the de-bates held in the European Parliament in the period between2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition,machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable.}, keywords = {Automatic Speech Recognition, Machine Translation, Multilingual Corpus, Speech Translation, Spoken Language Translation}, pubstate = {published}, tppubtype = {inproceedings} } Current research into spoken language translation (SLT), or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the de-bates held in the European Parliament in the period between2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition,machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable. |
2018 |
Matusov, Evgeny; Wilken, Patrick; Bahar, Parnia; Schamper, Julian; Golik, Pavel; Zeyer, Albert; Silvestre-Cerdà, Joan Albert; Martínez-Villaronga, Adrià; Pesch, Hendrick; Peter, Jan-Thorsten Neural Speech Translation at AppTek Inproceedings Proc. of 15th Intl. Workshop on Spoken Language Translation (IWSLT 2018), pp. 104–111, Hong Kong, 2018. Links | BibTeX | Tags: Automatic Speech Recognition, Machine Translation @inproceedings{Matusov18, title = {Neural Speech Translation at AppTek}, author = {Evgeny Matusov AND Patrick Wilken AND Parnia Bahar AND Julian Schamper AND Pavel Golik AND Albert Zeyer AND Joan Albert Silvestre-Cerdà AND Adrià Martínez-Villaronga AND Hendrick Pesch AND Jan-Thorsten Peter}, url = {https://www.mllp.upv.es/wp-content/uploads/2019/07/iwslt18.pdf https://workshop2018.iwslt.org/downloads/Proceedings_IWSLT_2018.pdf}, year = {2018}, date = {2018-07-01}, booktitle = {Proc. of 15th Intl. Workshop on Spoken Language Translation (IWSLT 2018)}, pages = {104--111}, address = {Hong Kong}, keywords = {Automatic Speech Recognition, Machine Translation}, pubstate = {published}, tppubtype = {inproceedings} } |
2016 |
Silvestre-Cerdà, Joan Albert; Juan, Alfons; Civera, Jorge Different Contributions to Cost-Effective Transcription and Translation of Video Lectures Inproceedings Proc. of IX Jornadas en Tecnología del Habla and V Iberian SLTech Workshop (IberSpeech 2016), pp. 313-319, Lisbon (Portugal), 2016, ISBN: 978-3-319-49168-4 . Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Automatic transcription and translation, Machine Translation, Video Lectures @inproceedings{Silvestre-Cerdà2016b, title = {Different Contributions to Cost-Effective Transcription and Translation of Video Lectures}, author = {Joan Albert Silvestre-Cerdà and Alfons Juan and Jorge Civera}, url = {http://www.mllp.upv.es/wp-content/uploads/2016/11/poster.pdf http://www.mllp.upv.es/wp-content/uploads/2016/11/paper.pdf http://hdl.handle.net/10251/62194}, isbn = {978-3-319-49168-4 }, year = {2016}, date = {2016-11-24}, booktitle = {Proc. of IX Jornadas en Tecnología del Habla and V Iberian SLTech Workshop (IberSpeech 2016)}, pages = {313-319}, address = {Lisbon (Portugal)}, abstract = {In recent years, on-line multimedia repositories have experiencied a strong growth that have made them consolidated as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that gives accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities, among many other benefits and applications. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate highquality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main outcome derived from this multidisciplinary thesis, The transLectures-UPV Platform, has been publicly released as an open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in many Spanish and European universities and institutions.}, keywords = {Automatic Speech Recognition, Automatic transcription and translation, Machine Translation, Video Lectures}, pubstate = {published}, tppubtype = {inproceedings} } In recent years, on-line multimedia repositories have experiencied a strong growth that have made them consolidated as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that gives accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities, among many other benefits and applications. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate highquality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main outcome derived from this multidisciplinary thesis, The transLectures-UPV Platform, has been publicly released as an open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in many Spanish and European universities and institutions. |
Silvestre-Cerdà, Joan Albert Different Contributions to Cost-Effective Transcription and Translation of Video Lectures PhD Thesis Universitat Politècnica de València, 2016, (Advisors: Alfons Juan Ciscar and Jorge Civera Saiz). Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Education, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, video lecture repositories, Video Lectures @phdthesis{Silvestre-Cerdà2016, title = {Different Contributions to Cost-Effective Transcription and Translation of Video Lectures}, author = {Joan Albert Silvestre-Cerdà}, url = {http://hdl.handle.net/10251/62194 http://www.mllp.upv.es/wp-content/uploads/2016/01/slides.pdf http://www.mllp.upv.es/wp-content/uploads/2016/01/thesis.pdf http://www.mllp.upv.es/phd-thesis-different-contributions-to-cost-effective-transcription-and-translation-of-video-lectures-by-joan-albert-silvestre-cerda-abstract/}, year = {2016}, date = {2016-01-27}, school = {Universitat Politècnica de València}, abstract = {In recent years, online multimedia repositories have experienced a strong growth that has consolidated them as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that provides accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main technological outcome derived from this thesis, the transLectures-UPV Platform (TLP), has been publicly released as open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in Spanish and European universities and institutions.}, note = {Advisors: Alfons Juan Ciscar and Jorge Civera Saiz}, keywords = {Automatic Speech Recognition, Education, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, video lecture repositories, Video Lectures}, pubstate = {published}, tppubtype = {phdthesis} } In recent years, online multimedia repositories have experienced a strong growth that has consolidated them as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that provides accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main technological outcome derived from this thesis, the transLectures-UPV Platform (TLP), has been publicly released as open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in Spanish and European universities and institutions. |
2015 |
Pérez González de Martos, Alejandro ; Silvestre-Cerdà, Joan Albert ; Valor Miró, Juan Daniel ; Civera, Jorge ; Juan, Alfons MLLP Transcription and Translation Platform Miscellaneous 2015, (Short paper for demo presentation accepted at 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), Toledo (Spain), 2015.). Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Docencia en Red, Document translation, Efficient video subtitling, Machine Translation, MLLP, Post-editing, Video Lectures @misc{mllpttp, title = {MLLP Transcription and Translation Platform}, author = {Pérez González de Martos, Alejandro and Silvestre-Cerdà, Joan Albert and Valor Miró, Juan Daniel and Civera, Jorge and Juan, Alfons}, url = {http://hdl.handle.net/10251/65747 http://www.mllp.upv.es/wp-content/uploads/2015/09/ttp_platform_demo_ectel2015.pdf http://ectel2015.httc.de/index.php?id=722}, year = {2015}, date = {2015-09-16}, booktitle = {Tenth European Conference On Technology Enhanced Learning (EC-TEL 2015)}, abstract = {This paper briefly presents the main features of MLLP’s Transcription and Translation Platform, which uses state-of-the-art automatic speech recognition and machine translation systems to generate multilingual subtitles of educational audiovisual and textual content. It has proven to reduce user effort up to 1/3 of the time needed to generate transcriptions and translations from scratch.}, note = {Short paper for demo presentation accepted at 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), Toledo (Spain), 2015.}, keywords = {Automatic Speech Recognition, Docencia en Red, Document translation, Efficient video subtitling, Machine Translation, MLLP, Post-editing, Video Lectures}, pubstate = {published}, tppubtype = {misc} } This paper briefly presents the main features of MLLP’s Transcription and Translation Platform, which uses state-of-the-art automatic speech recognition and machine translation systems to generate multilingual subtitles of educational audiovisual and textual content. It has proven to reduce user effort up to 1/3 of the time needed to generate transcriptions and translations from scratch. |
2013 |
Silvestre-Cerdà, Joan Albert; Pérez, Alejandro; Jiménez, Manuel; Turró, Carlos; Juan, Alfons; Civera, Jorge A System Architecture to Support Cost-Effective Transcription and Translation of Large Video Lecture Repositories Inproceedings Proc. of the IEEE Intl. Conf. on Systems, Man, and Cybernetics SMC 2013 , pp. 3994-3999, Manchester (UK), 2013. Abstract | Links | BibTeX | Tags: Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures @inproceedings{Silvestre-Cerdà2013, title = {A System Architecture to Support Cost-Effective Transcription and Translation of Large Video Lecture Repositories}, author = {Joan Albert Silvestre-Cerdà and Alejandro Pérez and Manuel Jiménez and Carlos Turró and Alfons Juan and Jorge Civera}, url = {http://dx.doi.org/10.1109/SMC.2013.682}, year = {2013}, date = {2013-01-01}, booktitle = {Proc. of the IEEE Intl. Conf. on Systems, Man, and Cybernetics SMC 2013 }, pages = {3994-3999}, address = {Manchester (UK)}, abstract = {Online video lecture repositories are rapidly growing and becoming established as fundamental knowledge assets. However, most lectures are neither transcribed nor translated because of the lack of cost-effective solutions that can give accurate enough results. In this paper, we describe a system architecture that supports the cost-effective transcription and translation of large video lecture repositories. This architecture has been adopted in the EU project transLectures and is now being tested on a repository of more than 9000 video lectures at the Universitat Politecnica de Valencia. Following a brief description of this repository and of the transLectures project, we describe the proposed system architecture in detail. We also report empirical results on the quality of the transcriptions and translations currently being maintained and steadily improved.}, keywords = {Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures}, pubstate = {published}, tppubtype = {inproceedings} } Online video lecture repositories are rapidly growing and becoming established as fundamental knowledge assets. However, most lectures are neither transcribed nor translated because of the lack of cost-effective solutions that can give accurate enough results. In this paper, we describe a system architecture that supports the cost-effective transcription and translation of large video lecture repositories. This architecture has been adopted in the EU project transLectures and is now being tested on a repository of more than 9000 video lectures at the Universitat Politecnica de Valencia. Following a brief description of this repository and of the transLectures project, we describe the proposed system architecture in detail. We also report empirical results on the quality of the transcriptions and translations currently being maintained and steadily improved. |
2012 |
Silvestre-Cerdà, Joan Albert ; Del Agua, Miguel ; Garcés, Gonçal; Gascó, Guillem; Giménez-Pastor, Adrià; Martínez, Adrià; Pérez González de Martos, Alejandro ; Sánchez, Isaías; Serrano Martínez-Santos, Nicolás ; Spencer, Rachel; Valor Miró, Juan Daniel ; Andrés-Ferrer, Jesús; Civera, Jorge; Sanchís, Alberto; Juan, Alfons transLectures Inproceedings Proceedings (Online) of IberSPEECH 2012, pp. 345–351, Madrid (Spain), 2012. Abstract | Links | BibTeX | Tags: Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures @inproceedings{Silvestre-Cerdà2012b, title = {transLectures}, author = {Silvestre-Cerdà, Joan Albert and Del Agua, Miguel and Gonçal Garcés and Guillem Gascó and Adrià Giménez-Pastor and Adrià Martínez and Pérez González de Martos, Alejandro and Isaías Sánchez and Serrano Martínez-Santos, Nicolás and Rachel Spencer and Valor Miró, Juan Daniel and Jesús Andrés-Ferrer and Jorge Civera and Alberto Sanchís and Alfons Juan}, url = {http://hdl.handle.net/10251/37290 http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VII/IberSPEECH2012_OnlineProceedings.pdf https://web.archive.org/web/20130609073144/http://iberspeech2012.ii.uam.es/IberSPEECH2012_OnlineProceedings.pdf http://www.mllp.upv.es/wp-content/uploads/2015/04/1209IberSpeech.pdf}, year = {2012}, date = {2012-11-22}, booktitle = {Proceedings (Online) of IberSPEECH 2012}, pages = {345--351}, address = {Madrid (Spain)}, abstract = {[EN] transLectures (Transcription and Translation of Video Lectures) is an EU STREP project in which advanced automatic speech recognition and machine translation techniques are being tested on large video lecture repositories. The project began in November 2011 and will run for three years. This paper will outline the project's main motivation and objectives, and give a brief description of the two main repositories being considered: VideoLectures.NET and poliMèdia. The first results obtained by the UPV group for the poliMedia repository will also be provided. [CA] transLectures (Transcription and Translation of Video Lectures) és un projecte del 7PM de la Unió Europea en el qual s'estan posant a prova tècniques avançades de reconeixement automàtic de la parla i de traducció automàtica sobre grans repositoris digitals de vídeos docents. El projecte començà al novembre de 2011 i tindrà una duració de tres anys. En aquest article exposem la motivació i els objectius del projecte, i descrivim breument els dos repositoris principals sobre els quals es treballa: VideoLectures.NET i poliMèdia. També oferim els primers resultats obtinguts per l'equip de la UPV al repositori poliMèdia.}, keywords = {Accessibility, Automatic Speech Recognition, Education, Intelligent Interaction, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, Opencast Matterhorn, Video Lectures}, pubstate = {published}, tppubtype = {inproceedings} } [EN] transLectures (Transcription and Translation of Video Lectures) is an EU STREP project in which advanced automatic speech recognition and machine translation techniques are being tested on large video lecture repositories. The project began in November 2011 and will run for three years. This paper will outline the project's main motivation and objectives, and give a brief description of the two main repositories being considered: VideoLectures.NET and poliMèdia. The first results obtained by the UPV group for the poliMedia repository will also be provided. [CA] transLectures (Transcription and Translation of Video Lectures) és un projecte del 7PM de la Unió Europea en el qual s'estan posant a prova tècniques avançades de reconeixement automàtic de la parla i de traducció automàtica sobre grans repositoris digitals de vídeos docents. El projecte començà al novembre de 2011 i tindrà una duració de tres anys. En aquest article exposem la motivació i els objectius del projecte, i descrivim breument els dos repositoris principals sobre els quals es treballa: VideoLectures.NET i poliMèdia. També oferim els primers resultats obtinguts per l'equip de la UPV al repositori poliMèdia. |
Publications
2022 |
Doblaje automático de vídeo-charlas educativas en UPV[Media] Inproceedings Proc. of VIII Congrés d'Innovació Educativa i Docència en Xarxa (IN-RED 2022), pp. 557–570, València (Spain), 2022. |
2021 |
Streaming cascade-based speech translation leveraged by a direct segmentation model Journal Article Neural Networks, 142 , pp. 303–315, 2021. |
2020 |
Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates Inproceedings Proc. of 45th Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020), pp. 8229–8233, Barcelona (Spain), 2020. |
2018 |
Neural Speech Translation at AppTek Inproceedings Proc. of 15th Intl. Workshop on Spoken Language Translation (IWSLT 2018), pp. 104–111, Hong Kong, 2018. |
2016 |
Different Contributions to Cost-Effective Transcription and Translation of Video Lectures Inproceedings Proc. of IX Jornadas en Tecnología del Habla and V Iberian SLTech Workshop (IberSpeech 2016), pp. 313-319, Lisbon (Portugal), 2016, ISBN: 978-3-319-49168-4 . |
Different Contributions to Cost-Effective Transcription and Translation of Video Lectures PhD Thesis Universitat Politècnica de València, 2016, (Advisors: Alfons Juan Ciscar and Jorge Civera Saiz). |
2015 |
MLLP Transcription and Translation Platform Miscellaneous 2015, (Short paper for demo presentation accepted at 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), Toledo (Spain), 2015.). |
2013 |
A System Architecture to Support Cost-Effective Transcription and Translation of Large Video Lecture Repositories Inproceedings Proc. of the IEEE Intl. Conf. on Systems, Man, and Cybernetics SMC 2013 , pp. 3994-3999, Manchester (UK), 2013. |
2012 |
transLectures Inproceedings Proceedings (Online) of IberSPEECH 2012, pp. 345–351, Madrid (Spain), 2012. |