2022
pdf
abs
Automatic Correction of Syntactic Dependency Annotation Differences
Andrew Zupon
|
Andrew Carnie
|
Michael Hammond
|
Mihai Surdeanu
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Annotation inconsistencies between data sets can cause problems for low-resource NLP, where noisy or inconsistent data cannot be easily replaced. We propose a method for automatically detecting annotation mismatches between dependency parsing corpora, along with three related methods for automatically converting the mismatches. All three methods rely on comparing unseen examples in a new corpus with similar examples in an existing corpus. These three methods include a simple lexical replacement using the most frequent tag of the example in the existing corpus, a GloVe embedding-based replacement that considers related examples, and a BERT-based replacement that uses contextualized embeddings to provide examples fine-tuned to our data. We evaluate these conversions by retraining two dependency parsers—Stanza and Parsing as Tagging (PaT)—on the converted and unconverted data. We find that applying our conversions yields significantly better performance in many cases. Some differences observed between the two parsers are observed. Stanza has a more complex architecture with a quadratic algorithm, taking longer to train, but it can generalize from less data. The PaT parser has a simpler architecture with a linear algorithm, speeding up training but requiring more training data to reach comparable or better performance.
2021
pdf
abs
Data augmentation for low-resource grapheme-to-phoneme mapping
Michael Hammond
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
In this paper we explore a very simple neural approach to mapping orthography to phonetic transcription in a low-resource context. The basic idea is to start from a baseline system and focus all efforts on data augmentation. We will see that some techniques work, but others do not.
2017
pdf
abs
Tell Me Why: Using Question Answering as Distant Supervision for Answer Justification
Rebecca Sharp
|
Mihai Surdeanu
|
Peter Jansen
|
Marco A. Valenzuela-Escárcega
|
Peter Clark
|
Michael Hammond
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)
For many applications of question answering (QA), being able to explain why a given model chose an answer is critical. However, the lack of labeled data for answer justifications makes learning this difficult and expensive. Here we propose an approach that uses answer ranking as distant supervision for learning how to select informative justifications, where justifications serve as inferential connections between the question and the correct answer while often containing little lexical overlap with either. We propose a neural network architecture for QA that reranks answer justifications as an intermediate (and human-interpretable) step in answer selection. Our approach is informed by a set of features designed to combine both learned representations and explicit features to capture the connection between questions, answers, and answer justifications. We show that with this end-to-end approach we are able to significantly improve upon a strong IR baseline in both justification ranking (+9% rated highly relevant) and answer selection (+6% P@1).
2016
pdf
Creating Causal Embeddings for Question Answering with Minimal Supervision
Rebecca Sharp
|
Mihai Surdeanu
|
Peter Jansen
|
Peter Clark
|
Michael Hammond
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing