2015
|
Valor Miró, Juan Daniel ; Silvestre-Cerdà, Joan Albert ; Civera, Jorge ; Turró, Carlos ; Juan, Alfons Efficient Generation of High-Quality Multilingual Subtitles for Video Lecture Repositories Inproceedings Proc. of 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), pp. 485–490, Toledo (Spain), 2015, ISBN: 978-3-319-24258-3. Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Docencia en Red, Efficient video subtitling, Polimedia, Statistical machine translation, video lecture repositories @inproceedings{valor2015efficient,
title = {Efficient Generation of High-Quality Multilingual Subtitles for Video Lecture Repositories},
author = {Valor Miró, Juan Daniel and Silvestre-Cerdà, Joan Albert and Civera, Jorge and Turró, Carlos and Juan, Alfons},
url = {http://link.springer.com/chapter/10.1007/978-3-319-24258-3_44
http://www.mllp.upv.es/wp-content/uploads/2016/03/paper.pdf
},
isbn = {978-3-319-24258-3},
year = {2015},
date = {2015-09-17},
booktitle = {Proc. of 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015)},
pages = {485--490},
address = {Toledo (Spain)},
abstract = {Video lectures are a valuable educational tool in higher education to support or replace face-to-face lectures in active learning strategies. In 2007 the Universitat Polit‘ecnica de Val‘encia (UPV) implemented its video lecture capture system, resulting in a high quality educational video repository, called poliMedia, with more than 10.000 mini lectures created by 1.373 lecturers. Also, in the framework of the European project transLectures, UPV has automatically generated transcriptions and translations in Spanish, Catalan and English for all videos included in the poliMedia video repository. transLectures’s objective responds to the widely-recognised need for subtitles to be provided with video lectures, as an essential service for non-native speakers and hearing impaired persons, and to allow advanced repository functionalities. Although high-quality automatic transcriptions and translations were generated in transLectures, they were not error-free. For this reason, lecturers need to manually review video subtitles to guarantee the absence of errors. The aim of this study is to evaluate the efficiency of the manual review process from automatic subtitles in comparison with the conventional generation of video subtitles from scratch. The reported results clearly indicate the convenience of providing automatic subtitles as a first step in the generation of video subtitles and the significant savings in time of up to almost 75% involved in reviewing subtitles.},
keywords = {Automatic Speech Recognition, Docencia en Red, Efficient video subtitling, Polimedia, Statistical machine translation, video lecture repositories},
pubstate = {published},
tppubtype = {inproceedings}
}
Video lectures are a valuable educational tool in higher education to support or replace face-to-face lectures in active learning strategies. In 2007 the Universitat Polit‘ecnica de Val‘encia (UPV) implemented its video lecture capture system, resulting in a high quality educational video repository, called poliMedia, with more than 10.000 mini lectures created by 1.373 lecturers. Also, in the framework of the European project transLectures, UPV has automatically generated transcriptions and translations in Spanish, Catalan and English for all videos included in the poliMedia video repository. transLectures’s objective responds to the widely-recognised need for subtitles to be provided with video lectures, as an essential service for non-native speakers and hearing impaired persons, and to allow advanced repository functionalities. Although high-quality automatic transcriptions and translations were generated in transLectures, they were not error-free. For this reason, lecturers need to manually review video subtitles to guarantee the absence of errors. The aim of this study is to evaluate the efficiency of the manual review process from automatic subtitles in comparison with the conventional generation of video subtitles from scratch. The reported results clearly indicate the convenience of providing automatic subtitles as a first step in the generation of video subtitles and the significant savings in time of up to almost 75% involved in reviewing subtitles. |
2012
|
Silvestre-Cerdà, Joan Albert; Andrés-Ferrer, Jesús; Civera, Jorge Explicit length modelling for statistical machine translation Journal Article Pattern Recognition, 45 (9), pp. 3183 - 3192, 2012, ISSN: 0031-3203. Abstract | Links | BibTeX | Tags: Length modelling, Log-linear models, Phrase-based models, Statistical machine translation @article{Silvestre-Cerdà2012a,
title = {Explicit length modelling for statistical machine translation},
author = {Joan Albert Silvestre-Cerdà and Jesús Andrés-Ferrer and Jorge Civera},
url = {http://hdl.handle.net/10251/34996},
issn = {0031-3203},
year = {2012},
date = {2012-01-01},
journal = {Pattern Recognition},
volume = {45},
number = {9},
pages = {3183 - 3192},
abstract = {Explicit length modelling has been previously explored in statistical pattern recognition with successful results. In this paper, two length models along with two parameter estimation methods and two alternative parametrisation for statistical machine translation (SMT) are presented. More precisely, we incorporate explicit bilingual length modelling in a state-of-the-art log-linear SMT system as an additional feature function in order to prove the contribution of length information. Finally, a systematic evaluation on reference SMT tasks considering different language pairs prove the benefits of explicit length modelling.},
keywords = {Length modelling, Log-linear models, Phrase-based models, Statistical machine translation},
pubstate = {published},
tppubtype = {article}
}
Explicit length modelling has been previously explored in statistical pattern recognition with successful results. In this paper, two length models along with two parameter estimation methods and two alternative parametrisation for statistical machine translation (SMT) are presented. More precisely, we incorporate explicit bilingual length modelling in a state-of-the-art log-linear SMT system as an additional feature function in order to prove the contribution of length information. Finally, a systematic evaluation on reference SMT tasks considering different language pairs prove the benefits of explicit length modelling. |
2011
|
Silvestre-Cerdà, Joan Albert; Andrés-Ferrer, Jesús ; Civera, Jorge Explicit Length Modelling for Statistical Machine Translation Incollection Vitrià, Jordi ; Sanches, JoãoMiguel ; Hernández, Mario (Ed.): Pattern Recognition and Image Analysis (IbPRIA 2011), 6669 , pp. 273-280, Springer Berlin Heidelberg, 2011, ISBN: 978-3-642-21256-7. Abstract | Links | BibTeX | Tags: Length modelling, Log-linear models, Phrase-based models, Statistical machine translation @incollection{Silvestre-Cerdà2011,
title = {Explicit Length Modelling for Statistical Machine Translation},
author = { Joan Albert Silvestre-Cerdà and Jesús Andrés-Ferrer and Jorge Civera},
editor = {Vitrià, Jordi and Sanches, JoãoMiguel and Hernández, Mario},
url = {http://hdl.handle.net/10251/35749
http://dx.doi.org/10.1007/978-3-642-21257-4_34},
isbn = {978-3-642-21256-7},
year = {2011},
date = {2011-01-01},
booktitle = {Pattern Recognition and Image Analysis (IbPRIA 2011)},
volume = {6669},
pages = {273-280},
publisher = {Springer Berlin Heidelberg},
series = {Lecture Notes in Computer Science},
abstract = {Explicit length modelling has been previously explored in statistical pattern recognition with successful results. In this paper, two length models along with two parameter estimation methods for statistical machine translation (SMT) are presented. More precisely, we incorporate explicit length modelling in a state-of-the-art log-linear SMT system as an additional feature function in order to prove the contribution of length information. Finally, promising experimental results are reported on a reference SMT task.},
keywords = {Length modelling, Log-linear models, Phrase-based models, Statistical machine translation},
pubstate = {published},
tppubtype = {incollection}
}
Explicit length modelling has been previously explored in statistical pattern recognition with successful results. In this paper, two length models along with two parameter estimation methods for statistical machine translation (SMT) are presented. More precisely, we incorporate explicit length modelling in a state-of-the-art log-linear SMT system as an additional feature function in order to prove the contribution of length information. Finally, promising experimental results are reported on a reference SMT task. |
Silvestre-Cerdà, Joan Albert; García-Martínez, Mercedes; Barrón-Cedeño, Alberto; Civera, Jorge; Rosso, Paolo Extracción de corpus paralelos de la Wikipedia basada en la obtención de alineamientos bilingües a nivel de frase Inproceedings Proceedings of the Workshop on Iberian Cross-Language Natural Language Processing Tasks (ICL 2011), pp. 14-21, CEUR-WS, 2011, ISSN: 1613-0073. Abstract | Links | BibTeX | Tags: Comparable Corpora, Parallel Sentences Extraction, Statistical machine translation @inproceedings{Silvestre-Cerdà2011b,
title = {Extracción de corpus paralelos de la Wikipedia basada en la obtención de alineamientos bilingües a nivel de frase},
author = {Joan Albert Silvestre-Cerdà and Mercedes García-Martínez and Alberto Barrón-Cedeño and Jorge Civera and Paolo Rosso},
url = {http://hdl.handle.net/10251/27930},
issn = {1613-0073},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the Workshop on Iberian Cross-Language Natural Language Processing Tasks (ICL 2011)},
volume = {824},
pages = {14-21},
publisher = {CEUR-WS},
abstract = {This paper presents a proposal for extracting parallel corpora from Wikipedia on the basis of statistical machine translation techniques. We have used word-level alignment models from IBM in order to obtain phrase-level bilingual alignments between documents pairs. We have manually annotated a set of test English-Spanish comparable documents in order to evaluate the model. The obtained results are encouraging.},
keywords = {Comparable Corpora, Parallel Sentences Extraction, Statistical machine translation},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents a proposal for extracting parallel corpora from Wikipedia on the basis of statistical machine translation techniques. We have used word-level alignment models from IBM in order to obtain phrase-level bilingual alignments between documents pairs. We have manually annotated a set of test English-Spanish comparable documents in order to evaluate the model. The obtained results are encouraging. |