Sebastien Christian

2026

Creating Grammar Teaching Material for Endangered Languages with Hybrid Grammar Induction
Sebastien Christian
Findings of the Association for Computational Linguistics: ACL 2026

Explicit grammar teaching is central to endangered-language revitalization, but creating grammar lessons is labor-intensive and often falls to already overburdened teachers. We present HYGRAM, a hybrid grammar-induction method that combines typological priors, Bayesian inference, constrained LLM reasoning, and retrieval from sparse corpora and descriptive documents to generate topic-specific grammar lessons for classroom use. HYGRAM targets extremely low-resource settings and can operate from a small elicited corpus collected in roughly 10 hours of fieldwork together with any available reference materials. We evaluate the system on six typologically diverse endangered languages using expert linguist judgments of output content quality, pedagogical adequacy, and consistency across generated lessons. Results indicate that HYGRAM can produce coherent and practically useful lessons, with better quality when modest explanatory evidence is available. Feedback from Pacific language communities further suggests relevance for ongoing revitalization efforts. Overall, the work shows that evidence-constrained hybrid grammar induction can support grammar teaching and documentation where standard NLP pipelines are infeasible.

pdf bib abs

An Interactive System for Generating Revisable Grammar Lessons for Extremely Low-Resource Languages Without Expert Annotation
Sebastien Christian
Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9)

Endangered-language teaching often faces two practical bottlenecks: the scarcity of experts able to produce pedagogical grammars, and the dependence of most approaches on expert linguistic annotation. We present a human-in-the-loop system for extremely low-resource languages that addresses both constraints by combining lightweight concept-based annotation, typological inference, structured sentence-pair augmentation, document retrieval, and constrained language model generation. Rather than aiming to produce definitive grammatical descriptions, the system generates revisable grammar lesson drafts grounded in heterogeneous evidence, including elicited sentence pairs, free translation pairs, and descriptive documents. The interface is designed so that speakers, teachers, and other language practitioners without formal linguistic training can contribute usable data, inspect intermediate inferences, control source selection and generate draft grammar lessons. We describe the architecture, user workflows, and initial deployment experience in real-world revitalization settings. The contribution of the paper is an implemented workflow for early pedagogical draft generation under extreme data scarcity, not a controlled evaluation of pedagogical effectiveness.

Despite decades of progress in human language technology (HLT) and growing research interest in endangered languages, practical uptake of HLT in documentary linguistics workflows remains rare. In this opinion piece, we report on a structured dialogue among approximately twenty academics convened to diagnose why this gap persists. Across all topics, we identify a recurring structural problem, which we call the missing middle: despite the existence of many potentially useful HLTs, the connective infrastructure necessary to make them genuinely accessible to linguists and language communities does not exist. We report the details of our discussion and make four specific recommendations for how those active in language documentation and HLT research might orient their future work.

Co-authors

Venues

Fix author