Karthick Narayanan R
2025
Field to Model: Pairing Community Data Collection with Scalable NLP through the LiFE Suite
Karthick Narayanan R
|
Siddharth Singh
|
Saurabh Singh
|
Aryan Mathur
|
Ritesh Kumar
|
Shyam Ratan
|
Bornini Lahiri
|
Benu Pareek
|
Neerav Mathur
|
Amalesh Gope
|
Meiraba Takhellambam
|
Yogesh Dawer
Proceedings of the Fourth Workshop on NLP Applications to Field Linguistics
We present LiFE Suite as a “Field-to-Model” pipeline, designed to bridge community-centred data collection with scalable language model development. This paper describes the various tools integrated into the LiFE Suite that make this unified pipeline possible. Atekho, a mobile-first data collection platform, is designed to empower communities to assert their rights over their data. MATra-Lab, a web-based data processing and annotation tool, supports the management of field data and the creation of NLP-ready datasets with support from existing state-of-the-art NLP models. LiFE Model Studio, built on top of Hugging Face AutoTrain, offers a no-code solution for building scalable language models using the field data. This end-to-end integration ensures that every dataset collected in the field retains its linguistic, cultural, and metadata context, all the way through to deployable AI models and archive-ready datasets.
Search
Fix author
Co-authors
- Yogesh Dawer 1
- Amalesh Gope 1
- Ritesh Kumar 1
- Bornini Lahiri 1
- Aryan Mathur 1
- show all...