Piyapath T. Spencer

Also published as: Piyapath T Spencer


2025

pdf bib
Can LLMs Help Create Grammar?: Automating Grammar Creation for Endangered Languages with In-Context Learning
Piyapath T. Spencer | Nanthipat Kongborrirak
Proceedings of the 31st International Conference on Computational Linguistics

In the present-day documenting and preserving endangered languages, the application of Large Language Models (LLMs) presents a promising approach. This paper explores how LLMs, particularly through in-context learning, can assist in generating grammatical information for low-resource languages with limited amount of data. We takes Moklen as a case study to evaluate the efficacy of LLMs in producing coherent grammatical rules and lexical entries using only bilingual dictionaries and parallel sentences of the unknown language without building the model from scratch. Our methodology involves organising the existing linguistic data and prompting to efficiently enable to generate formal XLE grammar. Our results demonstrate that LLMs can successfully capture key grammatical structures and lexical information, although challenges such as the potential for English grammatical biases remain. This study highlights the potential of LLMs to enhance language documentation efforts, providing a cost-effective solution for generating linguistic data and contributing to the preservation of endangered languages.

2024

pdf bib
Documenting Endangered Languages with LangDoc: A Wordlist-Based System and A Case Study on Moklen
Piyapath T Spencer
Proceedings of the 3rd Workshop on NLP Applications to Field Linguistics (Field Matters 2024)

Language documentation, especially languages lacking standardised writing systems, is a laborious and time-consuming process. This paper introduces LangDoc, a comprehensive system designed to address challenges and improve the efficiency and accuracy of language documentation projects. LangDoc offers several features, including tools for managing, recording, and reviewing the collected data. It operates both online and offline, crucial for fieldwork in remote locations. The paper also presents a comparative analysis demonstrating LangDoc’s efficiency compared to other methods. A case study of the Moklen language documentation project demonstrates how the features address the specific challenges of working with endangered languages and remote communities. Future development areas include integrating with NLP tools for advanced linguistic analysis and emphasising its potential to support the preservation of language diversity.

pdf bib
Human-Centric NLP or AI-Centric Illusion?: A Critical Investigation
Piyapath T Spencer
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation