David King

Also published as: David L. King

2020

pdf
Interpreting Sequence-to-Sequence Models for Russian Inflectional Morphology
David King | Andrea Sims | Micha Elsner
Proceedings of the Society for Computation in Linguistics 2020

2019

pdf abs
The OSU/Facebook Realizer for SRST 2019: Seq2Seq Inflection and Serialized Tree2Tree Linearization
Kartikeya Upasani | David King | Jinfeng Rao | Anusha Balakrishnan | Michael White
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

We describe our exploratory system for the shallow surface realization task, which combines morphological inflection using character sequence-to-sequence models with a baseline linearizer that implements a tree-to-tree model using sequence-to-sequence models on serialized trees. Results for morphological inflection were competitive across languages. Due to time constraints, we could only submit complete results (including linearization) for English. Preliminary linearization results were decent, with a small benefit from reranking to prefer valid output trees, but inadequate control over the words in the output led to poor quality on longer sentences.

2018

pdf bib abs
Using Paraphrasing and Memory-Augmented Models to Combat Data Sparsity in Question Interpretation with a Virtual Patient Dialogue System
Lifeng Jin | David King | Amad Hussein | Michael White | Douglas Danforth
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

When interpreting questions in a virtual patient dialogue system one must inevitably tackle the challenge of a long tail of relatively infrequently asked questions. To make progress on this challenge, we investigate the use of paraphrasing for data augmentation and neural memory-based classification, finding that the two methods work best in combination. In particular, we find that the neural memory-based approach not only outperforms a straight CNN classifier on low frequency questions, but also takes better advantage of the augmented data created by paraphrasing, together yielding a nearly 10% absolute improvement in accuracy on the least frequently asked questions.

pdf abs
The OSU Realizer for SRST ‘18: Neural Sequence-to-Sequence Inflection and Incremental Locality-Based Linearization
David King | Michael White
Proceedings of the First Workshop on Multilingual Surface Realisation

Surface realization is a nontrivial task as it involves taking structured data and producing grammatically and semantically correct utterances. Many competing grammar-based and statistical models for realization still struggle with relatively simple sentences. For our submission to the 2018 Surface Realization Shared Task, we tackle the shallow task by first generating inflected wordforms with a neural sequence-to-sequence model before incrementally linearizing them. For linearization, we use a global linear model trained using early update that makes use of features that take into account the dependency structure and dependency locality. Using this pipeline sufficed to produce surprisingly strong results in the shared task. In future work, we intend to pursue joint approaches to linearization and morphological inflection and incorporating a neural language model into the linearization choices.

2017

pdf bib
A Simple Method for Clarifying Sentences with Coordination Ambiguities
Michael White | Manjuan Duan | David L. King
Proceedings of the 1st Workshop on Explainable Computational Intelligence (XCI 2017)

This paper describes our “breaker” submission to the 2017 EMNLP “Build It Break It” shared task on sentiment analysis. In order to cause the “builder” systems to make incorrect predictions, we edited items in the blind test data according to linguistically interpretable strategies that allow us to assess the ease with which the builder systems learn various components of linguistic structure. On the whole, our submitted pairs break all systems at a high rate (72.6%), indicating that sentiment analysis as an NLP task may still have a lot of ground to cover. Of the breaker strategies that we consider, we find our semantic and pragmatic manipulations to pose the most substantial difficulties for the builder systems.

We present the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These were developed during our ongoing work on automating the markup of scanned copies of the biodiversity literature. Such automation is required if historic literature is to be used to inform contemporary issues in biodiversity research. We consider an enhanced TEI XML markup language, which is used as an intermediate stage in translating from the initial XML obtained from Optical Character Recognition to taXMLit, the target annotation schema. The intermediate representation allows additional information from external sources such as a taxonomic thesaurus to be incorporated before the final translation into taXMLit. We give an overview of the project workflow in automating the markup process, and consider what extensions to existing markup schema will be required to best support working taxonomists. Finally, we discuss some of the particular issues which were encountered in converting between different XML formats.