Collaboration in the Production of a Massively Multilingual Lexicon

Martin Benjamin


Abstract
This paper discusses the multiple approaches to collaboration that the Kamusi Project is employing in the creation of a massively multilingual lexical resource. The project’s data structure enables the inclusion of large amounts of rich data within each sense-specific entry, with transitive concept-based links across languages. Data collection involves mining existing data sets, language experts using an online editing system, crowdsourcing, and games with a purpose. The paper discusses the benefits and drawbacks of each of these elements, and the steps the project is taking to account for those. Special attention is paid to guiding crowd members with targeted questions that produce results in a specific format. Collaboration is seen as an essential method for generating large amounts of linguistic data, as well as for validating the data so it can be considered trustworthy.
Anthology ID:
L14-1282
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
211–215
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/319_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Martin Benjamin. 2014. Collaboration in the Production of a Massively Multilingual Lexicon. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 211–215, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Collaboration in the Production of a Massively Multilingual Lexicon (Benjamin, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/319_Paper.pdf