SwissAdmin: A multilingual tagged parallel corpus of press releases

Yves Scherrer, Luka Nerima, Lorenza Russo, Maria Ivanova, Eric Wehrli


Abstract
SwissAdmin is a new multilingual corpus of press releases from the Swiss Federal Administration, available in German, French, Italian and English. We provide SwissAdmin in three versions: (i) plain texts of approximately 6 to 8 million words per language; (ii) sentence-aligned bilingual texts for each language pair; (iii) a part-of-speech-tagged version consisting of annotations in both the Universal tagset and the richer Fips tagset, along with grammatical functions, verb valencies and collocations. The SwissAdmin corpus is freely available at www.latl.unige.ch/swissadmin.
Anthology ID:
L14-1602
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1832–1836
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/772_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Yves Scherrer, Luka Nerima, Lorenza Russo, Maria Ivanova, and Eric Wehrli. 2014. SwissAdmin: A multilingual tagged parallel corpus of press releases. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1832–1836, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
SwissAdmin: A multilingual tagged parallel corpus of press releases (Scherrer et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/772_Paper.pdf