Jan Michelfeit


2016

pdf
European Union Language Resources in Sketch Engine
Vít Baisa | Jan Michelfeit | Marek Medveď | Miloš Jakubíček
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the corpus manager Sketch Engine. A completely new resource is introduced: EUR-Lex Corpus, being one of the largest parallel corpus available at the moment, containing 840 million English tokens and the largest language pair English-French has more than 25 million aligned segments (paragraphs).