2016 |
Silvestre-Cerdà, Joan Albert Different Contributions to Cost-Effective Transcription and Translation of Video Lectures PhD Thesis Universitat Politècnica de València, 2016, (Advisors: Alfons Juan Ciscar and Jorge Civera Saiz). Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Education, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, video lecture repositories, Video Lectures @phdthesis{Silvestre-Cerdà2016, title = {Different Contributions to Cost-Effective Transcription and Translation of Video Lectures}, author = {Joan Albert Silvestre-Cerdà}, url = {http://hdl.handle.net/10251/62194 http://www.mllp.upv.es/wp-content/uploads/2016/01/slides.pdf http://www.mllp.upv.es/wp-content/uploads/2016/01/thesis.pdf http://www.mllp.upv.es/phd-thesis-different-contributions-to-cost-effective-transcription-and-translation-of-video-lectures-by-joan-albert-silvestre-cerda-abstract/}, year = {2016}, date = {2016-01-27}, school = {Universitat Politècnica de València}, abstract = {In recent years, online multimedia repositories have experienced a strong growth that has consolidated them as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that provides accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main technological outcome derived from this thesis, the transLectures-UPV Platform (TLP), has been publicly released as open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in Spanish and European universities and institutions.}, note = {Advisors: Alfons Juan Ciscar and Jorge Civera Saiz}, keywords = {Automatic Speech Recognition, Education, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, video lecture repositories, Video Lectures}, pubstate = {published}, tppubtype = {phdthesis} } In recent years, online multimedia repositories have experienced a strong growth that has consolidated them as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that provides accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main technological outcome derived from this thesis, the transLectures-UPV Platform (TLP), has been publicly released as open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in Spanish and European universities and institutions. |
2015 |
Valor Miró, Juan Daniel ; Silvestre-Cerdà, Joan Albert ; Civera, Jorge ; Turró, Carlos ; Juan, Alfons Efficient Generation of High-Quality Multilingual Subtitles for Video Lecture Repositories Inproceedings Proc. of 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), pp. 485–490, Toledo (Spain), 2015, ISBN: 978-3-319-24258-3. Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Docencia en Red, Efficient video subtitling, Polimedia, Statistical machine translation, video lecture repositories @inproceedings{valor2015efficient, title = {Efficient Generation of High-Quality Multilingual Subtitles for Video Lecture Repositories}, author = {Valor Miró, Juan Daniel and Silvestre-Cerdà, Joan Albert and Civera, Jorge and Turró, Carlos and Juan, Alfons}, url = {http://link.springer.com/chapter/10.1007/978-3-319-24258-3_44 http://www.mllp.upv.es/wp-content/uploads/2016/03/paper.pdf }, isbn = {978-3-319-24258-3}, year = {2015}, date = {2015-09-17}, booktitle = {Proc. of 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015)}, pages = {485--490}, address = {Toledo (Spain)}, abstract = {Video lectures are a valuable educational tool in higher education to support or replace face-to-face lectures in active learning strategies. In 2007 the Universitat Polit‘ecnica de Val‘encia (UPV) implemented its video lecture capture system, resulting in a high quality educational video repository, called poliMedia, with more than 10.000 mini lectures created by 1.373 lecturers. Also, in the framework of the European project transLectures, UPV has automatically generated transcriptions and translations in Spanish, Catalan and English for all videos included in the poliMedia video repository. transLectures’s objective responds to the widely-recognised need for subtitles to be provided with video lectures, as an essential service for non-native speakers and hearing impaired persons, and to allow advanced repository functionalities. Although high-quality automatic transcriptions and translations were generated in transLectures, they were not error-free. For this reason, lecturers need to manually review video subtitles to guarantee the absence of errors. The aim of this study is to evaluate the efficiency of the manual review process from automatic subtitles in comparison with the conventional generation of video subtitles from scratch. The reported results clearly indicate the convenience of providing automatic subtitles as a first step in the generation of video subtitles and the significant savings in time of up to almost 75% involved in reviewing subtitles.}, keywords = {Automatic Speech Recognition, Docencia en Red, Efficient video subtitling, Polimedia, Statistical machine translation, video lecture repositories}, pubstate = {published}, tppubtype = {inproceedings} } Video lectures are a valuable educational tool in higher education to support or replace face-to-face lectures in active learning strategies. In 2007 the Universitat Polit‘ecnica de Val‘encia (UPV) implemented its video lecture capture system, resulting in a high quality educational video repository, called poliMedia, with more than 10.000 mini lectures created by 1.373 lecturers. Also, in the framework of the European project transLectures, UPV has automatically generated transcriptions and translations in Spanish, Catalan and English for all videos included in the poliMedia video repository. transLectures’s objective responds to the widely-recognised need for subtitles to be provided with video lectures, as an essential service for non-native speakers and hearing impaired persons, and to allow advanced repository functionalities. Although high-quality automatic transcriptions and translations were generated in transLectures, they were not error-free. For this reason, lecturers need to manually review video subtitles to guarantee the absence of errors. The aim of this study is to evaluate the efficiency of the manual review process from automatic subtitles in comparison with the conventional generation of video subtitles from scratch. The reported results clearly indicate the convenience of providing automatic subtitles as a first step in the generation of video subtitles and the significant savings in time of up to almost 75% involved in reviewing subtitles. |
Valor Miró, Juan Daniel ; Silvestre-Cerdà, Joan Albert; Civera, Jorge; Turró, Carlos; Juan, Alfons Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories Journal Article Speech Communication, 74 , pp. 65–75, 2015, ISSN: 0167-6393. Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Computer-assisted transcription, Interface design strategies, Usability study, video lecture repositories @article{Valor201565, title = {Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories}, author = {Valor Miró, Juan Daniel and Joan Albert Silvestre-Cerdà and Jorge Civera and Carlos Turró and Alfons Juan}, url = {http://www.sciencedirect.com/science/article/pii/S0167639315001016 http://www.mllp.upv.es/wp-content/uploads/2016/03/paper1.pdf}, issn = {0167-6393}, year = {2015}, date = {2015-01-01}, journal = {Speech Communication}, volume = {74}, pages = {65--75}, abstract = {Abstract Video lectures are widely used in education to support and complement face-to-face lectures. However, the utility of these audiovisual assets could be further improved by adding subtitles that can be exploited to incorporate added-value functionalities such as searchability, accessibility, translatability, note-taking, and discovery of content-related videos, among others. Today, automatic subtitles are prone to error, and need to be reviewed and post-edited in order to ensure that what students see on-screen are of an acceptable quality. This work investigates different user interface design strategies for this post-editing task to discover the best way to incorporate automatic transcription technologies into large educational video repositories. Our three-phase study involved lecturers from the Universitat Politècnica de València (UPV) with videos available on the poliMedia video lecture repository, which is currently over 10,000 video objects. Simply by conventional post-editing automatic transcriptions users almost reduced to half the time that would require to generate the transcription from scratch. As expected, this study revealed that the time spent by lecturers reviewing automatic transcriptions correlated directly with the accuracy of said transcriptions. However, it is also shown that the average time required to perform each individual editing operation could be precisely derived and could be applied in the definition of a user model. In addition, the second phase of this study presents a transcription review strategy based on confidence measures (CM) and compares it to the conventional post-editing strategy. Finally, a third strategy resulting from the combination of that based on \\{CM\\} with massive adaptation techniques for automatic speech recognition (ASR), achieved to improve the transcription review efficiency in comparison with the two aforementioned strategies.}, keywords = {Automatic Speech Recognition, Computer-assisted transcription, Interface design strategies, Usability study, video lecture repositories}, pubstate = {published}, tppubtype = {article} } Abstract Video lectures are widely used in education to support and complement face-to-face lectures. However, the utility of these audiovisual assets could be further improved by adding subtitles that can be exploited to incorporate added-value functionalities such as searchability, accessibility, translatability, note-taking, and discovery of content-related videos, among others. Today, automatic subtitles are prone to error, and need to be reviewed and post-edited in order to ensure that what students see on-screen are of an acceptable quality. This work investigates different user interface design strategies for this post-editing task to discover the best way to incorporate automatic transcription technologies into large educational video repositories. Our three-phase study involved lecturers from the Universitat Politècnica de València (UPV) with videos available on the poliMedia video lecture repository, which is currently over 10,000 video objects. Simply by conventional post-editing automatic transcriptions users almost reduced to half the time that would require to generate the transcription from scratch. As expected, this study revealed that the time spent by lecturers reviewing automatic transcriptions correlated directly with the accuracy of said transcriptions. However, it is also shown that the average time required to perform each individual editing operation could be precisely derived and could be applied in the definition of a user model. In addition, the second phase of this study presents a transcription review strategy based on confidence measures (CM) and compares it to the conventional post-editing strategy. Finally, a third strategy resulting from the combination of that based on \{CM\} with massive adaptation techniques for automatic speech recognition (ASR), achieved to improve the transcription review efficiency in comparison with the two aforementioned strategies. |
Publications
Accessibility Automatic Speech Recognition Computer-assisted transcription Confidence measures Deep Neural Networks Docencia en Red Education language model adaptation Language Modeling Language Technologies Length modelling Log-linear models Machine Translation Massive Adaptation Models basats en seqüències de paraules Models log-lineals Multilingualism Neural Machine Translation Opencast Matterhorn Polimedia Sliding window Speaker adaptation Speech Recognition Speech Translation Statistical machine translation streaming text-to-speech transcripciones video lecture repositories Video Lectures
2016 |
Different Contributions to Cost-Effective Transcription and Translation of Video Lectures PhD Thesis Universitat Politècnica de València, 2016, (Advisors: Alfons Juan Ciscar and Jorge Civera Saiz). |
2015 |
Efficient Generation of High-Quality Multilingual Subtitles for Video Lecture Repositories Inproceedings Proc. of 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), pp. 485–490, Toledo (Spain), 2015, ISBN: 978-3-319-24258-3. |
Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories Journal Article Speech Communication, 74 , pp. 65–75, 2015, ISSN: 0167-6393. |