Neerav Mathur


2023

pdf
An Open-source Web-based Application for Development of Resources and Technologies in Underresourced Languages
Siddharth Singh | Shyam Ratan | Neerav Mathur | Ritesh Kumar
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

The paper discusses the Linguistic Field Data Management and Analysis System (LiFE), a new open-source, web-based software that systematises storage, management, annotation, analysis and sharing of linguistic data gathered from the field as well as that crawled from various sources on the web such as YouTube, Twitter, Facebook, Instagram, Blog, Newspaper, Wikipedia, etc. The app supports two broad workflows - (a) the field linguists’ workflow in which data is collected directly from the speakers in the field and analysed further to produce grammatical descriptions, lexicons, educational materials and possibly language technologies; (b) the computational linguists’ workflow in which data collected from the web using automated crawlers or digitised using manual or semi-automatic means, annotated for various tasks and then used for developing different kinds of language technologies. In addition to supporting these workflows, the app provides some additional features as well - (a) it allows multiple users to collaboratively work on the same project via its granular access control and sharing option; (b) it allows the data to be exported to various formats including CSV, TSV, JSON, XLSX, , PDF, Textgrid, RDF (different serialisation formats) etc as appropriate; (c) it allows data import from various formats viz. LIFT XML, XLSX, JSON, CSV, TSV, Textgrid, etc; (d) it allows users to start working in the app at any stage of their work by giving the option to either create a new project from scratch or derive a new project from an existing project in the app.The app is currently available for use and testing on our server (http://life.unreal-tece.co.in/) and its source code has been released under AGPL license on our GitHub repository (https://github.com/unrealtecellp/life). It is licensed under separate, specific conditions for commercial usage.