Abstract
The OPUS corpus is a growing collection of translated documents collected from the internet. The current version contains about 30 million words in 60 languages. The entire corpus is sentence aligned and it also contains linguistic markup for certain languages.- Anthology ID:
- L04-1174
- Volume:
- Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
- Month:
- May
- Year:
- 2004
- Address:
- Lisbon, Portugal
- Editors:
- Maria Teresa Lino, Maria Francisca Xavier, Fátima Ferreira, Rute Costa, Raquel Silva
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf
- DOI:
- Cite (ACL):
- Jörg Tiedemann and Lars Nygaard. 2004. The OPUS Corpus - Parallel and Free: http://logos.uio.no/opus. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA).
- Cite (Informal):
- The OPUS Corpus - Parallel and Free: http://logos.uio.no/opus (Tiedemann & Nygaard, LREC 2004)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf