Sebastien Christian


2026

Explicit grammar teaching is central to endangered-language revitalization, but creating grammar lessons is labor-intensive and often falls to already overburdened teachers. We present HYGRAM, a hybrid grammar-induction method that combines typological priors, Bayesian inference, constrained LLM reasoning, and retrieval from sparse corpora and descriptive documents to generate topic-specific grammar lessons for classroom use. HYGRAM targets extremely low-resource settings and can operate from a small elicited corpus collected in roughly 10 hours of fieldwork together with any available reference materials. We evaluate the system on six typologically diverse endangered languages using expert linguist judgments of output content quality, pedagogical adequacy, and consistency across generated lessons. Results indicate that HYGRAM can produce coherent and practically useful lessons, with better quality when modest explanatory evidence is available. Feedback from Pacific language communities further suggests relevance for ongoing revitalization efforts. Overall, the work shows that evidence-constrained hybrid grammar induction can support grammar teaching and documentation where standard NLP pipelines are infeasible.
Endangered-language teaching often faces two practical bottlenecks: the scarcity of experts able to produce pedagogical grammars, and the dependence of most approaches on expert linguistic annotation. We present a human-in-the-loop system for extremely low-resource languages that addresses both constraints by combining lightweight concept-based annotation, typological inference, structured sentence-pair augmentation, document retrieval, and constrained language model generation. Rather than aiming to produce definitive grammatical descriptions, the system generates revisable grammar lesson drafts grounded in heterogeneous evidence, including elicited sentence pairs, free translation pairs, and descriptive documents. The interface is designed so that speakers, teachers, and other language practitioners without formal linguistic training can contribute usable data, inspect intermediate inferences, control source selection and generate draft grammar lessons. We describe the architecture, user workflows, and initial deployment experience in real-world revitalization settings. The contribution of the paper is an implemented workflow for early pedagogical draft generation under extreme data scarcity, not a controlled evaluation of pedagogical effectiveness.
Despite decades of progress in human language technology (HLT) and growing research interest in endangered languages, practical uptake of HLT in documentary linguistics workflows remains rare. In this opinion piece, we report on a structured dialogue among approximately twenty academics convened to diagnose why this gap persists. Across all topics, we identify a recurring structural problem, which we call the missing middle: despite the existence of many potentially useful HLTs, the connective infrastructure necessary to make them genuinely accessible to linguists and language communities does not exist. We report the details of our discussion and make four specific recommendations for how those active in language documentation and HLT research might orient their future work.