# Europarl-ASR Europarl-ASR: A Large Speech+Text Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization. - 1300 hours EN transcribed speech data. - 18 hours EN speech data w/ revised verbatim and official non-verbatim transcriptions, split in 2 dev/test partitions for 2 realistic ASR tasks. - 3 full sets of timed transcriptions for the training data: official non-verbatim, automatically noise-filtered, and automatically verbatimized. - 70 million tokens EN text data.