Abstract
Research on the history of words has led to remarkable insights about language and also about the history of human civilization more generally. This paper presents the Etymological Wordnet, the first database that aims at making word origin information available as a large, machine-readable network of words in many languages. The information in this resource is obtained from Wiktionary. Extracting a network of etymological information from Wiktionary requires significant effort, as much of the etymological information is only given in prose. We rely on custom pattern matching techniques and mine a large network with over 500,000 word origin links as well as over 2 million derivational/compositional links.- Anthology ID:
- L14-1063
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1148–1154
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/1083_Paper.pdf
- DOI:
- Cite (ACL):
- Gerard de Melo. 2014. Etymological Wordnet: Tracing The History of Words. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1148–1154, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- Etymological Wordnet: Tracing The History of Words (de Melo, LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/1083_Paper.pdf