Deryle Lonsdale

2020

pdf abs
Improving NMT Quality Using Terminology Injection
Duane K. Dougal | Deryle Lonsdale
Proceedings of the Twelfth Language Resources and Evaluation Conference

Many organizations use domain- or organization-specific words and phrases. This paper explores the use of vetted terminology as an input to neural machine translation (NMT) for improved results: ensuring that the translation of individual terms is consistent with an approved multilingual terminology collection. We discuss, implement, and evaluate a method for injecting terminology and for evaluating terminology injection. Our use of the long short-term memory (LSTM) attention mechanism prevalent in state-of-the-art NMT systems involves attention vectors for correctly identifying semantic entities and aligning the tokens that represent them, both in the source and the target languages. Appropriate terminology is then injected into matching alignments during decoding. We also introduce a new translation metric more sensitive to approved terminological content in MT output.

2014

pdf abs
Combining elicited imitation and fluency features for oral proficiency measurement
Deryle Lonsdale | Carl Christensen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The automatic grading of oral language tests has been the subject of much research in recent years. Several obstacles lie in the way of achieving this goal. Recent work suggests a testing technique called elicited imitation (EI) that can serve to accurately approximate global oral proficiency. This testing methodology, however, does not incorporate some fundamental aspects of language, such as fluency. Other work has suggested another testing technique, simulated speech (SS), as a supplement or an alternative to EI that can provide automated fluency metrics. In this work, we investigate a combination of fluency features extracted from SS tests and EI test scores as a means to more accurately predict oral language proficiency. Using machine learning and statistical modeling, we identify which features automatically extracted from SS tests best predicted hand-scored SS test results, and demonstrate the benefit of adding EI scores to these models. Results indicate that the combination of EI and fluency features do indeed more effectively predict hand-scored SS test scores. We finally discuss implications of this work for future automated oral testing scenarios.

pdf abs
Evaluating Lemmatization Models for Machine-Assisted Corpus-Dictionary Linkage
Kevin Black | Eric Ringger | Paul Felt | Kevin Seppi | Kristian Heal | Deryle Lonsdale
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The task of corpus-dictionary linkage (CDL) is to annotate each word in a corpus with a link to an appropriate dictionary entry that documents the sense and usage of the word. Corpus-dictionary linked resources include concordances, dictionaries with word usage examples, and corpora annotated with lemmas or word-senses. Such CDL resources are essential in learning a language and in linguistic research, translation, and philology. Lemmatization is a common approximation to automating corpus-dictionary linkage, where lemmas are treated as dictionary entry headwords. We intend to use data-driven lemmatization models to provide machine assistance to human annotators in the form of pre-annotations, and thereby reduce the costs of CDL annotation. In this work we adapt the discriminative string transducer DirecTL+ to perform lemmatization for classical Syriac, a low-resource language. We compare the accuracy of DirecTL+ with the Morfette discriminative lemmatizer. DirecTL+ achieves 96.92% overall accuracy but only by a margin of 0.86% over Morfette at the cost of a longer time to train the model. Error analysis on the models provides guidance on how to apply these models in a machine assistance setting for corpus-dictionary linkage.

pdf abs
Student achievement and French sentence repetition test scores
Deryle Lonsdale | Benjamin Millard
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Sentence repetition (SR) tests are one way of probing a language learner’s oral proficiency. Test-takers listen to a set of carefully engineered sentences of varying complexity one-by-one, and then try to repeat them back as exactly as possible. In this paper we explore how well an SR test that we have developed for French corresponds with the test-taker’s achievement levels, represented by proficiency interview scores and by college class enrollment. We describe how we developed our SR test items using various language resources, and present pertinent facts about the test administration. The responses were scored by humans and also by a specially designed automatic speech recognition (ASR) engine; we sketch both scoring approaches. Results are evaluated in several ways: correlations between human and ASR scores, item response analysis to quantify the relative difficulty of the items, and criterion-referenced analysis setting thresholds of consistency across proficiency levels. We discuss several observations and conclusions prompted by the analyses, and suggestions for future work.

2012

pdf abs
Item Development and Scoring for Japanese Oral Proficiency Testing
Hitokazu Matsushita | Deryle Lonsdale
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This study introduces and evaluates a computerized approach to measuring Japanese L2 oral proficiency. We present a testing and scoring method that uses a type of structured speech called elicited imitation (EI) to evaluate accuracy of speech productions. Several types of language resources and toolkits are required to develop, administer, and score responses to this test. First, we present a corpus-based test item creation method to produce EI items with targeted linguistic features in a principled and efficient manner. Second, we sketch how we are able to bootstrap a small learner speech corpus to generate a significantly large corpus of training data for language model construction. Lastly, we show how newly created test items effectively classify learners according to their L2 speaking capability and illustrate how our scoring method computes a metric for language proficiency that correlates well with more traditional human scoring methods.

pdf abs
First Results in a Study Evaluating Pre-annotation and Correction Propagation for Machine-Assisted Syriac Morphological Analysis
Paul Felt | Eric Ringger | Kevin Seppi | Kristian Heal | Robbie Haertel | Deryle Lonsdale
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Manual annotation of large textual corpora can be cost-prohibitive, especially for rare and under-resourced languages. One potential solution is pre-annotation: asking human annotators to correct sentences that have already been annotated, usually by a machine. Another potential solution is correction propagation: using annotator corrections to bad pre-annotations to dynamically improve to the remaining pre-annotations within the current sentence. The research presented in this paper employs a controlled user study to discover under what conditions these two machine-assisted annotation techniques are effective in increasing annotator speed and accuracy and thereby reducing the cost for the task of morphologically annotating texts written in classical Syriac. A preliminary analysis of the data indicates that pre-annotations improve annotator accuracy when they are at least 60% accurate, and annotator speed when they are at least 80% accurate. This research constitutes the first systematic evaluation of pre-annotation and correction propagation together in a controlled user study.

2011

pdf
Elicited Imitation for Prediction of OPI Test Scores
Kevin Cook | Jeremiah McGhee | Deryle Lonsdale
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications

2010

Expert human input can contribute in various ways to facilitate automatic annotation of natural language text. For example, a part-of-speech tagger can be trained on labeled input provided offline by experts. In addition, expert input can be solicited by way of active learning to make the most of annotator expertise. However, hiring individuals to perform manual annotation is costly both in terms of money and time. This paper reports on a user study that was performed to determine the degree of effect that a part-of-speech dictionary has on a group of subjects performing the annotation task. The user study was conducted using a modular, web-based interface created specifically for text annotation tasks. The user study found that for both native and non-native English speakers a dictionary with greater than 60% coverage was effective at reducing annotation time and increasing annotator accuracy. On the basis of this study, we predict that using a part-of-speech tag dictionary with coverage greater than 60% can reduce the cost of annotation in terms of both time and money.

pdf abs
Principled Construction of Elicited Imitation Tests
Carl Christensen | Ross Hendrickson | Deryle Lonsdale
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we discuss the methodology behind the construction of elicited imitation (EI) test items. First we examine varying uses for EI tests in research and in testing overall oral proficiency. We also mention criticisms of previous test items. Then we identify the factors that contribute to the difficulty of an EI item as shown in previous studies. Based on this discussion, we describe a way of automating the creation of test items in order to better evaluate language learners' oral proficiency while improving item naturalness. We present a new item construction tool and the process that it implements in order to create test items from a corpus, identifying relevant features needed to compile a database of EI test items. We examine results from administration of a new EI test engineered in this manner, illustrating the effect that standard language resources can have on creating an effective EI test item repository. We also sketch ongoing work on test item generation for other languages and an adaptive test that will use this collection of test items.

2008

Fixed, limited budgets often constrain the amount of expert annotation that can go into the construction of annotated corpora. Estimating the cost of annotation is the first step toward using annotation resources wisely. We present here a study of the cost of annotation. This study includes the participation of annotators at various skill levels and with varying backgrounds. Conducted over the web, the study consists of tests that simulate machine-assisted pre-annotation, requiring correction by the annotator rather than annotation from scratch. The study also includes tests representative of an annotation scenario involving Active Learning as it progresses from a naïve model to a knowledgeable model; in particular, annotators encounter pre-annotation of varying degrees of accuracy. The annotation interface lists tags considered likely by the annotation model in preference to other tags. We present the experimental parameters of the study and report both descriptive and inferential statistics on the results of the study. We conclude with a model for estimating the hourly cost of annotation for annotators of various skill levels. We also present models for two granularities of annotation: sentence at a time and word at a time.

pdf abs
Elicited Imitation as an Oral Proficiency Measure with ASR Scoring
C. Ray Graham | Deryle Lonsdale | Casey Kennington | Aaron Johnson | Jeremiah McGhee
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper discusses development and evaluation of a practical, valid and reliable instrument for evaluating the spoken language abilities of second-language (L2) learners of English. First we sketch the theory and history behind elicited imitation (EI) tests and the renewed interest in them. Then we present how we developed a new test based on various language resources, and administered it to a few hundred students of varying levels. The students were also scored using standard evaluation techniques, and the EI results were compared to more traditionally derived scores. We also sketch how we developed a new integrated tool that allows the session recordings of the EI data to be analyzed with a widely-used automatic speech recognition (ASR) engine. We discuss the promising results of the ASR engines processing of these files and how they correlated with human scoring of the same items. We indicate how the integrated tool will be used in the future. Further development plans and prospects for follow-on work round out the discussion.

2007

Co-authors

Venues

lrec10
bea2
tc1
law1
teachingnlp1
show all...

emnlp1

bcs1

tag1

Deryle Lonsdale

2020

2014

2012

2011

2010

2008

2007

2004

2003

2002

1999

1984

Co-authors

Venues