A 1300-hour English speech and text corpus of parliamentary debates for (streaming) ASR training and benchmarking, speech data filtering and speech data verbatimization.

Updated 3 months ago

This repository contains the code of the ACL 2022 paper "From Simultaneous to Streaming Machine Translation by Leveraging Streaming History".

Updated 10 months ago

This repository contains the code for the EMNLP 2021 paper "Stream-level Latency Evaluation for Simultaneous Machine Translation".

Updated 10 months ago

This repository contains the code for the segmentation system proposed in the EMNLP 2020 paper "Direct Segmentation Models for Streaming Speech Translation".

Updated 10 months ago

Updated 1 year ago

Early software by MLLP researchers (2010-2015): AK, GIDOC, jaf_Tools, Bilingual Text Classification.

Updated 1 year ago

Europarl-ST is a Multilingual Speech Translation Corpus which contains paired audio-text samples for Speech Translation, constructed using the debates carried out in the European Parliament in the period between 2008 and 2012.

Updated 1 year ago