Updated 4 months ago
A 1300-hour English speech and text corpus of parliamentary debates for (streaming) ASR training and benchmarking, speech data filtering and speech data verbatimization.
Updated 10 months ago
This repository contains the code of the ACL 2022 paper "From Simultaneous to Streaming Machine Translation by Leveraging Streaming History".
Updated 1 year ago
This repository contains the code for the EMNLP 2021 paper "Stream-level Latency Evaluation for Simultaneous Machine Translation".
Updated 1 year ago
This repository contains the code for the segmentation system proposed in the EMNLP 2020 paper "Direct Segmentation Models for Streaming Speech Translation".
Updated 1 year ago
Updated 1 year ago
Early software by MLLP researchers (2010-2015): AK, GIDOC, jaf_Tools, Bilingual Text Classification.
Updated 1 year ago
Europarl-ST is a Multilingual Speech Translation Corpus which contains paired audio-text samples for Speech Translation, constructed using the debates carried out in the European Parliament in the period between 2008 and 2012.
Updated 1 year ago
Updated 3 years ago