A 1300-hour English speech and text corpus of parliamentary debates for (streaming) ASR training and benchmarking, speech data filtering and speech data verbatimization.

Updated 2 months ago

Europarl-ST is a Multilingual Speech Translation Corpus which contains paired audio-text samples for Speech Translation, constructed using the debates carried out in the European Parliament in the period between 2008 and 2012.

Updated 4 months ago

Early software by MLLP researchers (2010-2015): AK, GIDOC, jaf_Tools, Bilingual Text Classification.

Updated 10 months ago

This repository contains the code for the paper "Stream-level Latency Evaluation for Simultaneous Machine Translation".

Updated 10 months ago

This repository contains the code for the segmentation system proposed in "Direct Segmentation Models for Streaming Speech Translation".

Updated 10 months ago

Updated 1 year ago