Abstract
This paper discusses the multiple approaches to collaboration that the Kamusi Project is employing in the creation of a massively multilingual lexical resource. The projects data structure enables the inclusion of large amounts of rich data within each sense-specific entry, with transitive concept-based links across languages. Data collection involves mining existing data sets, language experts using an online editing system, crowdsourcing, and games with a purpose. The paper discusses the benefits and drawbacks of each of these elements, and the steps the project is taking to account for those. Special attention is paid to guiding crowd members with targeted questions that produce results in a specific format. Collaboration is seen as an essential method for generating large amounts of linguistic data, as well as for validating the data so it can be considered trustworthy.- Anthology ID:
- L14-1282
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 211–215
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/319_Paper.pdf
- DOI:
- Cite (ACL):
- Martin Benjamin. 2014. Collaboration in the Production of a Massively Multilingual Lexicon. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 211–215, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- Collaboration in the Production of a Massively Multilingual Lexicon (Benjamin, LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/319_Paper.pdf