Towards Flexible Cross-Resource Exploitation of Heterogeneous Language Documentation Data

Daniel Jettka, Timm Lehmberg


Abstract
This paper reports on challenges and solution approaches in the development of methods for language resource overarching data analysis in the field of language documentation. It is based on the successful outcomes of the initial phase of an 18 year long-term project on lesser resourced and mostly endangered indigenous languages of the Northern Eurasian area, which included the finalization and publication of multiple language corpora and additional language resources. While aiming at comprehensive cross-resource data analysis, the project at the same time is confronted with a dynamic and complex resource landscape, especially resulting from a vast amount of multi-layered information stored in the form of analogue primary data in different widespread archives on the territory of the Russian Federation. The methods described aim at solving the tension between unification of data sets and vocabularies on the one hand and maximum openness for the integration of future resources and adaption of external information on the other hand.
Anthology ID:
2020.lrec-1.354
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2901–2905
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.354
DOI:
Bibkey:
Cite (ACL):
Daniel Jettka and Timm Lehmberg. 2020. Towards Flexible Cross-Resource Exploitation of Heterogeneous Language Documentation Data. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2901–2905, Marseille, France. European Language Resources Association.
Cite (Informal):
Towards Flexible Cross-Resource Exploitation of Heterogeneous Language Documentation Data (Jettka & Lehmberg, LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-2023-videos/2020.lrec-1.354.pdf