3 年前 · faa5582961
--- a/README.md
+++ b/README.md
@@ -27,7 +27,7 @@ Please refer to the publication:
 
				 
			
 
				 In the cascade approach to Speech Translation, an ASR system first transcribes the audio, and then the transcriptions are translated by a downstream MT system. 
			
 
				 
			
 
				-Standard MT systems are usually trained with sequences of around 100-150 tokens. Thus, if the audio is short, we can directly translate the transcription. However, when we have a long audio stream, the resulting transcription is many times longer than the maximum length seen by the MT model during training. This is why it is necessary to have a segmenter model that takes as input the stream of transcriped words, and outputs a stream of (hopefully) semantically self-contained segments, which are then translated independently by the MT model. The model presented here has been prepared to carry out segmentation in a streaming fashion.
			
 
				+Standard MT systems are usually trained with sequences of around 100-150 tokens. Thus, if the audio is short, we can directly translate the transcription. However, when we have a long audio stream, the resulting transcription is many times longer than the maximum length seen by the MT model during training. This is why it is necessary to have a segmenter model that takes as input the stream of transcribed words, and outputs a stream of (hopefully) semantically self-contained segments, which are then translated independently by the MT model. The model presented here has been prepared to carry out segmentation in a streaming fashion.
			
 
				 
			
 
				 
			
 
				 ## Get the code