Kristian Heal


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2014

pdf bib
Evaluating Lemmatization Models for Machine-Assisted Corpus-Dictionary Linkage
Kevin Black | Eric Ringger | Paul Felt | Kevin Seppi | Kristian Heal | Deryle Lonsdale
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The task of corpus-dictionary linkage (CDL) is to annotate each word in a corpus with a link to an appropriate dictionary entry that documents the sense and usage of the word. Corpus-dictionary linked resources include concordances, dictionaries with word usage examples, and corpora annotated with lemmas or word-senses. Such CDL resources are essential in learning a language and in linguistic research, translation, and philology. Lemmatization is a common approximation to automating corpus-dictionary linkage, where lemmas are treated as dictionary entry headwords. We intend to use data-driven lemmatization models to provide machine assistance to human annotators in the form of pre-annotations, and thereby reduce the costs of CDL annotation. In this work we adapt the discriminative string transducer DirecTL+ to perform lemmatization for classical Syriac, a low-resource language. We compare the accuracy of DirecTL+ with the Morfette discriminative lemmatizer. DirecTL+ achieves 96.92% overall accuracy but only by a margin of 0.86% over Morfette at the cost of a longer time to train the model. Error analysis on the models provides guidance on how to apply these models in a machine assistance setting for corpus-dictionary linkage.

pdf bib
Using Transfer Learning to Assist Exploratory Corpus Annotation
Paul Felt | Eric Ringger | Kevin Seppi | Kristian Heal
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We describe an under-studied problem in language resource management: that of providing automatic assistance to annotators working in exploratory settings. When no satisfactory tagset already exists, such as in under-resourced or undocumented languages, it must be developed iteratively while annotating data. This process naturally gives rise to a sequence of datasets, each annotated differently. We argue that this problem is best regarded as a transfer learning problem with multiple source tasks. Using part-of-speech tagging data with simulated exploratory tagsets, we demonstrate that even simple transfer learning techniques can significantly improve the quality of pre-annotations in an exploratory annotation.

2012

pdf bib
First Results in a Study Evaluating Pre-annotation and Correction Propagation for Machine-Assisted Syriac Morphological Analysis
Paul Felt | Eric Ringger | Kevin Seppi | Kristian Heal | Robbie Haertel | Deryle Lonsdale
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Manual annotation of large textual corpora can be cost-prohibitive, especially for rare and under-resourced languages. One potential solution is pre-annotation: asking human annotators to correct sentences that have already been annotated, usually by a machine. Another potential solution is correction propagation: using annotator corrections to bad pre-annotations to dynamically improve to the remaining pre-annotations within the current sentence. The research presented in this paper employs a controlled user study to discover under what conditions these two machine-assisted annotation techniques are effective in increasing annotator speed and accuracy and thereby reducing the cost for the task of morphologically annotating texts written in classical Syriac. A preliminary analysis of the data indicates that pre-annotations improve annotator accuracy when they are at least 60% accurate, and annotator speed when they are at least 80% accurate. This research constitutes the first systematic evaluation of pre-annotation and correction propagation together in a controlled user study.

2010

pdf bib
A Probabilistic Morphological Analyzer for Syriac
Peter McClanahan | George Busby | Robbie Haertel | Kristian Heal | Deryle Lonsdale | Kevin Seppi | Eric Ringger
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing