4 years ago · 44ff1634a2
--- a/README.md
+++ b/README.md
@@ -3,8 +3,8 @@ Europarl-ASR v1.0
 
				 2 April 2021  
			
 
				 [www.mllp.upv.es/europarl-asr](https://www.mllp.upv.es/europarl-asr)
			
 
				 
			
 
				-A large English-language speech and text corpus of parliamentary debates for
			
 
				-streaming ASR benchmarking, speech data filtering and speech data verbatimization.
			
 
				+A 1300-hour English-language speech and text corpus of parliamentary debates for
			
 
				+(streaming) ASR training and benchmarking, speech data filtering and speech data verbatimization.
			
 
				 
			
 
				 Keywords: automatic speech recognition; speech corpus; speech data filtering;
			
 
				 speech data verbatimization.
			
@@ -280,19 +280,19 @@ Europarl-ASR (EN) includes:
 
				 
			
 
				 #### Speech data
			
 
				 
			
 
				-* 1300 hours of English-language annotated speech data (33K speeches, 1K
			
 
				+* 1263 hours of English-language annotated speech data (33,002 speeches, 1046
			
 
				   speakers).
			
 
				 * 3 full sets of timed transcriptions: official non-verbatim transcriptions,
			
 
				   automatically noise-filtered transcriptions and automatically verbatimized
			
 
				   transcriptions.
			
 
				-* 18 hours of speech data with both manually revised verbatim transcriptions
			
 
				+* 17.5 hours of speech data with both manually revised verbatim transcriptions
			
 
				   and official non-verbatim transcriptions, split in 2 independent validation-
			
 
				   evaluation partitions for 2 realistic ASR tasks (with vs. without previous
			
 
				   knowledge of the speaker).
			
 
				 
			
 
				 #### Text data
			
 
				 
			
 
				-* 70 million tokens of English-language text data.
			
 
				+* 69.4 million tokens of English-language text data.
			
 
				 
			
 
				 #### Pretrained language models