2016
pdf
abs
The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
John Philip McCrae
|
Christian Chiarcos
|
Francis Bond
|
Philipp Cimiano
|
Thierry Declerck
|
Gerard de Melo
|
Jorge Gracia
|
Sebastian Hellmann
|
Bettina Klimek
|
Steven Moran
|
Petya Osenova
|
Antonio Pareja-Lora
|
Jonathan Pool
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The Open Linguistics Working Group (OWLG) brings together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections. A major outcome of our work is the Linguistic Linked Open Data (LLOD) cloud, an LOD (sub-)cloud of linguistic resources, which covers various linguistic databases, lexicons, corpora, terminologies, and metadata repositories. We present and summarize five years of progress on the development of the cloud and of advancements in open data in linguistics, and we describe recent community activities. The paper aims to serve as a guideline to orient and involve researchers with the community and/or Linguistic Linked Open Data.
2014
pdf
abs
PanLex: Building a Resource for Panlingual Lexical Translation
David Kamholz
|
Jonathan Pool
|
Susan Colowick
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
PanLex, a project of The Long Now Foundation, aims to enable the translation of lexemes among all human languages in the world. By focusing on lexemic translations, rather than grammatical or corpus data, it achieves broader lexical and language coverage than related projects. The PanLex database currently documents 20 million lexemes in about 9,000 language varieties, with 1.1 billion pairwise translations. The project primarily engages in content procurement, while encouraging outside use of its data for research and development. Its data acquisition strategy emphasizes broad, high-quality lexical and language coverage. The project plans to add data derived from 4,000 new sources to the database by the end of 2016. The dataset is publicly accessible via an HTTP API and monthly snapshots in CSV, JSON, and XML formats. Several online applications have been developed that query PanLex data. More broadly, the project aims to make a contribution to the preservation of global linguistic diversity.
2010
pdf
PanLex and LEXTRACT: Translating all Words of all Languages of the World
Timothy Baldwin
|
Jonathan Pool
|
Susan Colowick
Coling 2010: Demonstrations
2009
pdf
Lemmatic Machine Translation
Stephen Soderland
|
Christopher Lim
|
Mausam
|
Bo Qin
|
Oren Etzioni
|
Jonathan Pool
Proceedings of Machine Translation Summit XII: Papers
2006
pdf
Can Controlled Languages Scale to the Web?
Jonathan Pool
Proceedings of the 5th International Workshop on Controlled Language Applications