Brian Reese


2022

pdf
Shallow Parsing for Nepal Bhasa Complement Clauses
Borui Zhang | Abe Kazemzadeh | Brian Reese
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages

Accelerating the process of data collection, annotation, and analysis is an urgent need for linguistic fieldwork and documentation of endangered languages (Bird, 2009). Our experiments describe how we maximize the quality for the Nepal Bhasa syntactic complement structure chunking model. Native speaker language consultants were trained to annotate a minimally selected raw data set (Suárez et al.,2019). The embedded clauses, matrix verbs, and embedded verbs are annotated. We apply both statistical training algorithms and transfer learning in our training, including Naive Bayes, MaxEnt, and fine-tuning the pre-trained mBERT model (Devlin et al., 2018). We show that with limited annotated data, the model is already sufficient for the task. The modeling resources we used are largely available for many other endangered languages. The practice is easy to duplicate for training a shallow parser for other endangered languages in general.

2007

pdf
Discourse Annotation Working Group Report
Manfred Stede | Janyce Wiebe | Eva Hajičová | Brian Reese | Simone Teufel | Bonnie Webber | Theresa Wilson
Proceedings of the Linguistic Annotation Workshop