- transLectures: Transcription and translation of video lectures
- Period: 1/11/2011 – 31/10/2014
- Project website: http://www.translectures.eu
- Project supported by the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 287755.
transLectures, an EU-funded project to develop innovative, cost-effective tools for the automatic transcription and translation of online educational videos.
Project partners: Universitat Politècnica de València – MLLP (Spain; coordinator), Institut Jožef Stefan (Slovenia), RWTH Aachen University (Germany), Xerox S.A.S. (France), European Media Laboratory GmbH (Germany), Deluxe Media Europe (UK), Knowledge for All Foundation (UK; third party).
Online collections of video material are fast becoming a staple feature of the Internet and a key educational resource. In transLectures, we have developed technologies and tools that allow organizations to add multilingual subtitles to these videos, making their contents available to a much wider audience in a way that is cost-effective and sustainable over the vast collections of online video lectures being generated. Automatic transcription tools have been developed to provide verbatim subtitles of the talks recorded on video, thereby allowing the hard-of-hearing to access this content; language learners and other non-native speakers also benefit from these monolingual subtitles. At the same time, machine translation tools have been developed to make these subtitles available in languages other than that in which the video was recorded.
- Massive adaptation for the improvement of transcription and translation quality: Massive adaptation of acoustic models has been particularly effective in improving ASR systems. Strong gains were obtained in all tasks by using multilingual Deep Neural Networks. Massive adaptation of language models has also had a strong positive effect on the results achieved. The idea of taking advantage of video lecture slides was remarkably effective, and also the use of relevant documents mined from the web. In the case of translation models, massive adaptation has also shown to be very effective, with most of the work having been concentrated on the development and assessment of various techniques of data selection such as TM cross-entropy or infrequent n-gram selection.
The following charts show the scientific results obtained by the MLLP across the languages and language combinations we covered in transLectures. An important part of the progress shown is due to the development and application of massive adaptation techniques.
Transcription (Automatic Speech Recognition) results are reported based on the Word Error Rate (WER). When reading this chart, the lower the WER, the higher the transcription quality.
Translation (Machine Translation) results are reported based on the Bilingual Evaluation Understudy (BLEU) quality metric. When reading this chart, the higher the BLEU, the higher the translation quality.
- Intelligent interaction for the improvement of transcription and translation quality: Intelligent interaction for transcription has shown to be very effective in the production of accurate transcriptions by using a small fraction of user effort. In particular, significant relative reductions in WER about 20%–30% can be achieved by only supervising a 5%–10% of recognised words. The most important reason behind this performance is the use of word-level confidence measures to locate misrecognised words. Intelligent interaction for translation and fast constrained search techniques have also been studied, with empirical evaluations showing that their use can provide modest improvements.
- Integration: The results of the project have been integrated and tested in VideoLectures.NET and the UPV’s poliMedia repository, and made compatible with Opencast, a free, open-source video capture platform. The MLLP has developed and released two open source software toolkits: the transLectures-UPV Toolkit (TLK) for automatic speech recognition, and the transLectures-UPV Platform (TLP), which allows for the integration of transLectures technologies into a media repository. Indeed, TLP and TLK are the tools being used for transLectures processing of the live poliMedia repository, and are now also being used by other Spanish and European universities to develop multilingual online courses.
- Evaluation: Scientific evaluation has shown that large improvements were achieved by means of massive adaptation techniques in all ASR tasks and most MT tasks; see the scientific result charts above for specific figures on MLLP results. Additionally, in quality control processes carried out by project experts, significant improvements were reported in both case studies along the project. All in all, in the final round of quality control, results were considered really useful in terms of productivity gains by professional editors. Internal and external evaluations were also carried out. For internal evaluations, on the UPV’s poliMedia repository, UPV lecturers revised automatic transcriptions and translations using the TLP player; in the case of transcriptions and some translation combinations, significant user effort reductions were measured. For external evaluations, the automatic subtitles in poliMedia were made open for revision by the viewers. This functionality is still active to this day; revisions by external viewers are submitted for the lecturer’s approval before being made public. Furthermore, the MLLP opened a public trial platform to showcase a complete video transcription and translation workflow with transLectures technologies. The successor to this is the currently available MLLP transcription and translation platform, which dozens of public and private organizations from all over the world have tried so far. For some of them, this has been a first step before starting a more extensive use of transLectures technologies.
This is a brief summary of some of the main results of the transLectures project. For more details, you can visit this brief guide to transLectures’ reports and publications (in particular, for a more extensive summary, see transLectures’ Publishable Summary from the Final Periodic Report).
Try for yourself how transLectures technology is evolving in our MLLP transcription and translation platform.
Or jump back to the top-right menu to find the project’s video demos and open source tools.