Iranzo-Sánchez, Javier; Jorge, Javier; Pérez-González-de-Martos, Alejandro; Giménez, Adrià; Garcés Díaz-Munío, Gonçal V; Baquero-Arnal, Pau; Silvestre-Cerdà, Joan Albert; Civera, Jorge; Sanchis, Albert; Juan, Alfons MLLP-VRAIN UPV systems for the IWSLT 2022 Simultaneous Speech Translation and Speech-to-Speech Translation tasks Inproceedings Proc. of 19th Intl. Conf. on Spoken Language Translation (IWSLT 2022), pp. 255–264, Dublin (Ireland), 2022. Abstract | Links | BibTeX | Tags: Simultaneous Speech Translation, speech-to-speech translation @inproceedings{Iranzo-Sánchez2022b,
title = {MLLP-VRAIN UPV systems for the IWSLT 2022 Simultaneous Speech Translation and Speech-to-Speech Translation tasks},
author = {Javier Iranzo-Sánchez and Javier Jorge and Alejandro Pérez-González-de-Martos and Adrià Giménez and Garcés Díaz-Munío, Gonçal V. and Pau Baquero-Arnal and Joan Albert Silvestre-Cerdà and Jorge Civera and Albert Sanchis and Alfons Juan},
doi = {10.18653/v1/2022.iwslt-1.22},
year = {2022},
date = {2022-01-01},
booktitle = {Proc. of 19th Intl. Conf. on Spoken Language Translation (IWSLT 2022)},
pages = {255--264},
address = {Dublin (Ireland)},
abstract = {This work describes the participation of the MLLP-VRAIN research group in the two shared tasks of the IWSLT 2022 conference: Simultaneous Speech Translation and Speech-to-Speech Translation. We present our streaming-ready ASR, MT and TTS systems for Speech Translation and Synthesis from English into German. Our submission combines these systems by means of a cascade approach paying special attention to data preparation and decoding for streaming inference.},
keywords = {Simultaneous Speech Translation, speech-to-speech translation},
pubstate = {published},
tppubtype = {inproceedings}
}
This work describes the participation of the MLLP-VRAIN research group in the two shared tasks of the IWSLT 2022 conference: Simultaneous Speech Translation and Speech-to-Speech Translation. We present our streaming-ready ASR, MT and TTS systems for Speech Translation and Synthesis from English into German. Our submission combines these systems by means of a cascade approach paying special attention to data preparation and decoding for streaming inference. |
Pérez-González-de-Martos, Alejandro; Iranzo-Sánchez, Javier; Giménez Pastor, Adrià ; Jorge, Javier; Silvestre-Cerdà, Joan-Albert; Civera, Jorge; Sanchis, Albert; Juan, Alfons Towards simultaneous machine interpretation Inproceedings Proc. Interspeech 2021, pp. 2277–2281, Brno (Czech Republic), 2021. Abstract | Links | BibTeX | Tags: cross-lingual voice cloning, incremental text-to-speech, simultaneous machine interpretation, speech-to-speech translation @inproceedings{Pérez-González-de-Martos2021,
title = {Towards simultaneous machine interpretation},
author = {Alejandro Pérez-González-de-Martos and Javier Iranzo-Sánchez and Giménez Pastor, Adrià and Javier Jorge and Joan-Albert Silvestre-Cerdà and Jorge Civera and Albert Sanchis and Alfons Juan},
doi = {10.21437/Interspeech.2021-201},
year = {2021},
date = {2021-01-01},
booktitle = {Proc. Interspeech 2021},
journal = {Proc. Interspeech 2021},
pages = {2277--2281},
address = {Brno (Czech Republic)},
abstract = {Automatic speech-to-speech translation (S2S) is one of the most challenging speech and language processing tasks, especially when considering its application to real-time settings. Recent advances in streaming Automatic Speech Recognition (ASR), simultaneous Machine Translation (MT) and incremental neural Text-To-Speech (TTS) make it possible to develop real-time cascade S2S systems with greatly improved accuracy. On the way to simultaneous machine interpretation, a state-of-the-art cascade streaming S2S system is described and empirically assessed in the simultaneous interpretation of European Parliament debates. We pay particular attention to the TTS component, particularly in terms of speech naturalness under a variety of response-time settings, as well as in terms of speaker similarity for its cross-lingual voice cloning capabilities.},
keywords = {cross-lingual voice cloning, incremental text-to-speech, simultaneous machine interpretation, speech-to-speech translation},
pubstate = {published},
tppubtype = {inproceedings}
}
Automatic speech-to-speech translation (S2S) is one of the most challenging speech and language processing tasks, especially when considering its application to real-time settings. Recent advances in streaming Automatic Speech Recognition (ASR), simultaneous Machine Translation (MT) and incremental neural Text-To-Speech (TTS) make it possible to develop real-time cascade S2S systems with greatly improved accuracy. On the way to simultaneous machine interpretation, a state-of-the-art cascade streaming S2S system is described and empirically assessed in the simultaneous interpretation of European Parliament debates. We pay particular attention to the TTS component, particularly in terms of speech naturalness under a variety of response-time settings, as well as in terms of speaker similarity for its cross-lingual voice cloning capabilities. |