3 years ago · 91c5004706
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 
				 # Europarl-ASR
			
 
				-v1.0<br />
			
 
				-2 April 2021<br />
			
 
				+Europarl-ASR v1.0  
			
 
				+2 April 2021  
			
 
				 [www.mllp.upv.es/europarl-asr](https://www.mllp.upv.es/europarl-asr)
			
 
				 
			
 
				 A large English-language speech and text corpus of parliamentary debates for
			
@@ -171,10 +171,10 @@ In the cases of "dev" and "test", they are subdivided in directories "spk-dep"
 
				 and "spk-indep". Thus, for speech data, we have 2 train-dev-test partitions
			
 
				 for 2 different ASR tasks, as follows:
			
 
				 
			
 
				-1. ASR with known speakers (MEP):<br />
			
 
				+1. ASR with known speakers (MEP):  
			
 
				    train ; dev/original_audio/spk-dep ; test/original_audio/spk-dep
			
 
				    
			
 
				-1. ASR with unknown speakers (Guest):<br />
			
 
				+1. ASR with unknown speakers (Guest):  
			
 
				    train ; dev/original_audio/spk-indep ; test/original_audio/spk-indep
			
 
				 
			
 
				 Each of these partition directories contains 3 to 4 subdirectories (depending
			
@@ -188,7 +188,7 @@ speeches per speaker.
 
				 corresponding set (as csv and json files). For each speech we will find these
			
 
				 metadata (as reflected in speeches.headers.csv):
			
 
				 
			
 
				-&nbsp;&nbsp;&nbsp;&nbsp;term;session_date;speech_id;speaker_type;speaker_id;raw_dur;<br />
			
 
				+&nbsp;&nbsp;&nbsp;&nbsp;term;session_date;speech_id;speaker_type;speaker_id;raw_dur;  
			
 
				 &nbsp;&nbsp;&nbsp;&nbsp;aligned-speech_dur;filtered-speech_dur;cer;ar;path;agenda_item_title
			
 
				 
			
 
				 And for each speaker (as reflected in speakers.headers.csv):
			
@@ -203,24 +203,24 @@ according to this subdirectory structure:
 
				 For each speech, we will find some of the following files (depending on
			
 
				 whether it is in the train set or in the dev/test sets):
			
 
				 
			
 
				-&nbsp;&nbsp;&nbsp;&nbsp;`ep-asr.en.orig.<term>.<session_date>.<speech_id>.m4a`<br />
			
 
				+&nbsp;&nbsp;&nbsp;&nbsp;`ep-asr.en.orig.<term>.<session_date>.<speech_id>.m4a`  
			
 
				 &nbsp;&nbsp;&nbsp;&nbsp;[In all sets] Audio of the speech.
			
 
				 
			
 
				-&nbsp;&nbsp;&nbsp;&nbsp;`ep-asr.en.orig.<term>.<session_date>.<speech_id>.tr.orig.{dfxp,json,srt,txt}`<br />
			
 
				+&nbsp;&nbsp;&nbsp;&nbsp;`ep-asr.en.orig.<term>.<session_date>.<speech_id>.tr.orig.{dfxp,json,srt,txt}`  
			
 
				 &nbsp;&nbsp;&nbsp;&nbsp;[In all sets] Official non-verbatim transcription of the speech, as a txt
			
 
				   raw transcription file, as dfxp or srt force-aligned timed subtitle files,
			
 
				   and its json metadata.
			
 
				 
			
 
				-&nbsp;&nbsp;&nbsp;&nbsp;`ep-asr.en.orig.<term>.<session_date>.<speech_id>.tr.filt.{dfxp,json,srt}`<br />
			
 
				+&nbsp;&nbsp;&nbsp;&nbsp;`ep-asr.en.orig.<term>.<session_date>.<speech_id>.tr.filt.{dfxp,json,srt}`  
			
 
				 &nbsp;&nbsp;&nbsp;&nbsp;[In train set] Automatically filtered transcription of the speech, as dfxp
			
 
				   or srt force-aligned timed subtitle files, and its json metadata.
			
 
				 
			
 
				-&nbsp;&nbsp;&nbsp;&nbsp;`ep-asr.en.orig.<term>.<session_date>.<speech_id>.tr.verb.{dfxp,json,srt,txt}`<br />
			
 
				+&nbsp;&nbsp;&nbsp;&nbsp;`ep-asr.en.orig.<term>.<session_date>.<speech_id>.tr.verb.{dfxp,json,srt,txt}`  
			
 
				 &nbsp;&nbsp;&nbsp;&nbsp;[In train set] Automatically verbatimized transcription of the speech, as
			
 
				   a txt transcription file, as dfxp or srt timed subtitle files,
			
 
				   and its json metadata.
			
 
				 
			
 
				-&nbsp;&nbsp;&nbsp;&nbsp;`ep-asr.en.orig.<term>.<session_date>.<speech_id>.tr.rev.{dfxp,json,srt,txt}`<br />
			
 
				+&nbsp;&nbsp;&nbsp;&nbsp;`ep-asr.en.orig.<term>.<session_date>.<speech_id>.tr.rev.{dfxp,json,srt,txt}`  
			
 
				 &nbsp;&nbsp;&nbsp;&nbsp;[In dev/test sets] Manually revised verbatim transcription of the speech,
			
 
				   as a txt transcription file, as dfxp or srt timed subtitle
			
 
				   files, and its json metadata.