PanLex: Building a Resource for Panlingual Lexical Translation

David Kamholz, Jonathan Pool, Susan Colowick


Abstract
PanLex, a project of The Long Now Foundation, aims to enable the translation of lexemes among all human languages in the world. By focusing on lexemic translations, rather than grammatical or corpus data, it achieves broader lexical and language coverage than related projects. The PanLex database currently documents 20 million lexemes in about 9,000 language varieties, with 1.1 billion pairwise translations. The project primarily engages in content procurement, while encouraging outside use of its data for research and development. Its data acquisition strategy emphasizes broad, high-quality lexical and language coverage. The project plans to add data derived from 4,000 new sources to the database by the end of 2016. The dataset is publicly accessible via an HTTP API and monthly snapshots in CSV, JSON, and XML formats. Several online applications have been developed that query PanLex data. More broadly, the project aims to make a contribution to the preservation of global linguistic diversity.
Anthology ID:
L14-1023
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3145–3150
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1029_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
David Kamholz, Jonathan Pool, and Susan Colowick. 2014. PanLex: Building a Resource for Panlingual Lexical Translation. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3145–3150, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
PanLex: Building a Resource for Panlingual Lexical Translation (Kamholz et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1029_Paper.pdf