2018
pdf
bib
abs
Using Wikipedia Edits in Low Resource Grammatical Error Correction
Adriane Boyd
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
We develop a grammatical error correction (GEC) system for German using a small gold GEC corpus augmented with edits extracted from Wikipedia revision history. We extend the automatic error annotation tool ERRANT (Bryant et al., 2017) for German and use it to analyze both gold GEC corrections and Wikipedia edits (Grundkiewicz and Junczys-Dowmunt, 2014) in order to select as additional training data Wikipedia edits containing grammatical corrections similar to those in the gold corpus. Using a multilayer convolutional encoder-decoder neural network GEC approach (Chollampatt and Ng, 2018), we evaluate the contribution of Wikipedia edits and find that carefully selected Wikipedia edits increase performance by over 5%.
pdf
bib
Normalization in Context: Inter-Annotator Agreement for Meaning-Based Target Hypothesis Annotation
Adriane Boyd
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning
2014
pdf
bib
abs
The MERLIN corpus: Learner language and the CEFR
Adriane Boyd
|
Jirka Hana
|
Lionel Nicolas
|
Detmar Meurers
|
Katrin Wisniewski
|
Andrea Abel
|
Karin Schöne
|
Barbora Štindlová
|
Chiara Vettori
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The MERLIN corpus is a written learner corpus for Czech, German,and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR) with authentic learner data. The corpus contains 2,290 learner texts produced in standardized language certifications covering CEFR levels A1-C1. The MERLIN annotation scheme includes a wide range of language characteristics that enable research into the empirical foundations of the CEFR scales and provide language teachers, test developers, and Second Language Acquisition researchers with concrete examples of learner performance and progress across multiple proficiency levels. For computational linguistics, it provide a range of authentic learner data for three target languages, supporting a broadening of the scope of research in areas such as automatic proficiency classification or native language identification. The annotated corpus and related information will be freely available as a corpus resource and through a freely accessible, didactically-oriented online platform.
2012
pdf
bib
Informing Determiner and Preposition Error Correction with Hierarchical Word Clustering
Adriane Boyd
|
Marion Zepf
|
Detmar Meurers
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
2011
pdf
bib
Data-Driven Correction of FunctionWords in Non-Native English
Adriane Boyd
|
Detmar Meurers
Proceedings of the 13th European Workshop on Natural Language Generation
2010
pdf
bib
abs
EAGLE: an Error-Annotated Corpus of Beginning Learner German
Adriane Boyd
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper describes the Error-Annotated German Learner Corpus (EAGLE), a corpus of beginning learner German with grammatical error annotation. The corpus contains online workbook and and hand-written essay data from learners in introductory German courses at The Ohio State University. We introduce an error typology developed for beginning learners of German that focuses on linguistic properties of lexical items present in the learner data and present the detailed error typologies for selection, agreement, and word order errors. The corpus uses an error annotation format that extends the multi-layer standoff format proposed by Luedeling et al. (2005) to include incremental target hypotheses for each error. In this format, each annotated error includes information about the location of tokens affected by the error, the error type, and the proposed target correction. The multi-layer standoff format allows us to annotate ambiguous errors with more than one possible target correction and to annotate the multiple, overlapping errors common in beginning learner productions.
pdf
bib
Proceedings of the NAACL HLT 2010 Student Research Workshop
Julia Hockenmaier
|
Diane Litman
|
Adriane Boyd
|
Mahesh Joshi
|
Frank Rudzicz
Proceedings of the NAACL HLT 2010 Student Research Workshop
pdf
bib
Enhancing Authentic Web Pages for Language Learners
Detmar Meurers
|
Ramon Ziai
|
Luiz Amaral
|
Adriane Boyd
|
Aleksandar Dimitrov
|
Vanessa Metcalf
|
Niels Ott
Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
2009
pdf
bib
Pronunciation Modeling in Spelling Correction for Writers of English as a Foreign Language
Adriane Boyd
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
2008
pdf
bib
Revisiting the Impact of Different Annotation Schemes on PCFG Parsing: A Grammatical Dependency Evaluation
Adriane Boyd
|
Detmar Meurers
Proceedings of the Workshop on Parsing German
2007
pdf
bib
Discontinuity Revisited: An Improved Conversion to Context-Free Representations
Adriane Boyd
Proceedings of the Linguistic Annotation Workshop
2005
pdf
bib
Identifying Non-Referential it: A Machine Learning Approach Incorporating Linguistically Motivated Patterns
Adriane Boyd
|
Whitney Gegg-Harrison
|
Donna Byron
Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing