The articles “Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates” and “LSTM-Based One-Pass Decoder for Low-Latency Streaming“, by Javier Iranzo, Javier Jorge and other MLLP researchers, have been accepted for publication at the IEEE’s ICASSP 2020 conference.
The IEEE’s 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020), to be held this year in Barcelona (Spain) between 4 and 8 May, is the world’s largest technical conference on signal processing and its applications (including automatic speech recognition and other fields of natural language processing). ICASSP is a high-impact international conference, with a CORE B ranking.
The accepted MLLP articles have been considered valuable contributions to spoken language translation and to online speech recognition. Here are the details of each article:
Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates
Javier Iranzo-Sánchez, Joan Albert Silvestre-Cerdà, Javier Jorge, Nahuel Roselló, Adrià Giménez, Albert Sanchis, Jorge Civera, Alfons Juan
Current research into spoken language translation (SLT), or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the de-bates held in the European Parliament in the period between2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition,machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable.
LSTM-Based One-Pass Decoder for Low-Latency Streaming
Javier Jorge, Adrià Giménez, Javier Iranzo-Sánchez, Joan Albert Silvestre-Cerdà, Jorge Civera, Albert Sanchis, Alfons Juan
Current state-of-the-art models based on Long-Short Term Memory (LSTM) networks have been extensively used in ASR to improve performance. However, using LSTMs under a streaming setup is not straightforward due to real-time constraints. In this paper we present a novel streaming decoder that includes a bidirectional LSTM acoustic model as well as an unidirectional LSTM language model to perform the decoding efficiently while keeping the performance comparable to that of an off-line setup. We perform a one-pass decoding using a sliding window scheme for a bidirectional LSTM acoustic model and an LSTM language model. This has been implemented and assessed under a pure streaming setup, and deployed into our production systems. We report WER and latency figures for the well-known LibriSpeech and TED-LIUM tasks, obtaining competitive WER results with low-latency responses.
Since the foundation of the MLLP research group (2014), MLLP members have published over 10 international journal articles (IEEE-ACM Trans. Audio Speech Lang., 2018; Pattern Recognition Letters, 51, 2015; …) and over 20 international conference papers (Interspeech 2019; AMTA 2014; …). You can browse through all of the 200+ publications by MLLP researchers in the Publications section in our website.
We at the MLLP are glad to participate in ICASSP 2020. We look forward to seeing you there!