Europarl-ST is a Multilingual Speech Translation Corpus, that contains paired audio-text samples for Speech Translation, constructed using the debates carried out in the European Parliament in the period between 2008 and 2012.
The full details of the corpus are available in the paper:
https://ieeexplore.ieee.org/document/9054626
(Preprint also available: https://arxiv.org/abs/1911.03167)
For more information about the activities of our research group, visit:
https://www.mllp.upv.es/
For any questions or comments regarding the corpus, don't hesitate to contact Javier Iranzo-Sánchez (jairsan@upv.es)
If you use the corpus in your research please cite the following reference:
@INPROCEEDINGS{jairsan2020a, author={J. {Iranzo-Sánchez} and J. A. {Silvestre-Cerdà} and J. {Jorge} and N. {Roselló} and A. {Giménez} and A. {Sanchis} and J. {Civera} and A. {Juan}}, booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates}, year={2020}, pages={8229-8233},}
src/tgt | en | fr | de | it | es | pt | pl | ro | nl |
---|---|---|---|---|---|---|---|---|---|
en | - | 81 | 83 | 80 | 81 | 81 | 79 | 72 | 80 |
fr | 32 | - | 21 | 20 | 21 | 22 | 20 | 18 | 22 |
de | 30 | 18 | - | 17 | 18 | 18 | 17 | 17 | 18 |
it | 37 | 21 | 21 | - | 21 | 21 | 21 | 19 | 20 |
es | 22 | 14 | 14 | 14 | - | 14 | 13 | 12 | 13 |
pt | 15 | 10 | 10 | 10 | 10 | - | 9 | 9 | 9 |
pl | 28 | 18 | 18 | 17 | 18 | 18 | - | 16 | 18 |
ro | 24 | 12 | 12 | 12 | 12 | 12 | 12 | - | 12 |
nl | 7 | 5 | 5 | 4 | 5 | 4 | 4 | 4 | - |
src/tgt | en | fr | de | it | es | pt | pl | ro | nl |
---|---|---|---|---|---|---|---|---|---|
en | - | 89 | 90 | 84 | 88 | 89 | 87 | 89 | 88 |
fr | 39 | - | 39 | 38 | 39 | 41 | 40 | 38 | 43 |
de | 54 | 54 | 51 | 53 | 53 | 53 | 53 | 53 | 53 |
it | 15 | 15 | 15 | - | 15 | 15 | 15 | 15 | 15 |
es | 10 | 10 | 10 | 10 | - | 10 | 10 | 10 | 10 |
pt | 5 | 5 | 5 | 5 | 5 | - | 5 | 5 | 5 |
pl | 16 | 15 | 16 | 15 | 16 | 15 | - | 16 | 15 |
ro | 4 | 4 | 3 | 3 | 4 | 4 | 3 | - | 4 |
nl | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | - |