Tom Lippincott


Graph convolutional networks for exploring authorship hypotheses
Tom Lippincott
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

This work considers a task from traditional literary criticism: annotating a structured, composite document with information about its sources. We take the Documentary Hypothesis, a prominent theory regarding the composition of the first five books of the Hebrew bible, extract stylistic features designed to avoid bias or overfitting, and train several classification models. Our main result is that the recently-introduced graph convolutional network architecture outperforms structurally-uninformed models. We also find that including information about the granularity of text spans is a crucial ingredient when employing hidden layers, in contrast to simple logistic regression. We perform error analysis at several levels, noting how some characteristic limitations of the models and simple features lead to misclassifications, and conclude with an overview of future work.

JHU System Description for the MADAR Arabic Dialect Identification Shared Task
Tom Lippincott | Pamela Shapiro | Kevin Duh | Paul McNamee
Proceedings of the Fourth Arabic Natural Language Processing Workshop

Our submission to the MADAR shared task on Arabic dialect identification employed a language modeling technique called Prediction by Partial Matching, an ensemble of neural architectures, and sources of additional data for training word embeddings and auxiliary language models. We found several of these techniques provided small boosts in performance, though a simple character-level language model was a strong baseline, and a lower-order LM achieved best performance on Subtask 2. Interestingly, word embeddings provided no consistent benefit, and ensembling struggled to outperform the best component submodel. This suggests the variety of architectures are learning redundant information, and future work may focus on encouraging decorrelated learning.


Observational Comparison of Geo-tagged and Randomly-drawn Tweets
Tom Lippincott | Annabelle Carrell
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

Twitter is a ubiquitous source of micro-blog social media data, providing the academic, industrial, and public sectors real-time access to actionable information. A particularly attractive property of some tweets is *geo-tagging*, where a user account has opted-in to attaching their current location to each message. Unfortunately (from a researcher’s perspective) only a fraction of Twitter accounts agree to this, and these accounts are likely to have systematic diffences with the general population. This work is an exploratory study of these differences across the full range of Twitter content, and complements previous studies that focus on the English-language subset. Additionally, we compare methods for querying users by self-identified properties, finding that the constrained semantics of the “description” field provides cleaner, higher-volume results than more complex regular expressions.

Portable, layer-wise task performance monitoring for NLP models
Tom Lippincott
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

There is a long-standing interest in understanding the internal behavior of neural networks. Deep neural architectures for natural language processing (NLP) are often accompanied by explanations for their effectiveness, from general observations (e.g. RNNs can represent unbounded dependencies in a sequence) to specific arguments about linguistic phenomena (early layers encode lexical information, deeper layers syntactic). The recent ascendancy of DNNs is fueling efforts in the NLP community to explore these claims. Previous work has tended to focus on easily-accessible representations like word or sentence embeddings, with deeper structure requiring more ad hoc methods to extract and examine. In this work, we introduce Vivisect, a toolkit that aims at a general solution for broad and fine-grained monitoring in the major DNN frameworks, with minimal change to research patterns.


pdf bib
CADET: Computer Assisted Discovery Extraction and Translation
Benjamin Van Durme | Tom Lippincott | Kevin Duh | Deana Burchfield | Adam Poliak | Cash Costello | Tim Finin | Scott Miller | James Mayfield | Philipp Koehn | Craig Harman | Dawn Lawrie | Chandler May | Max Thomas | Annabelle Carrell | Julianne Chaloux | Tongfei Chen | Alex Comerford | Mark Dredze | Benjamin Glass | Shudong Hao | Patrick Martin | Pushpendre Rastogi | Rashmi Sankepally | Travis Wolfe | Ying-Ying Tran | Ted Zhang
Proceedings of the IJCNLP 2017, System Demonstrations

Computer Assisted Discovery Extraction and Translation (CADET) is a workbench for helping knowledge workers find, label, and translate documents of interest. It combines a multitude of analytics together with a flexible environment for customizing the workflow for different users. This open-source framework allows for easy development of new research prototypes using a micro-service architecture based atop Docker and Apache Thrift.


Fluency detection on communication networks
Tom Lippincott | Benjamin Van Durme
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing


Exploring variation across biomedical subdomains
Tom Lippincott | Diarmuid Ó Séaghdha | Lin Sun | Anna Korhonen
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)


Relation between Agreement Measures on Human Labeling and Machine Learning Performance: Results from an Art History Domain
Rebecca Passonneau | Tom Lippincott | Tae Yano | Judith Klavans
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We discuss factors that affect human agreement on a semantic labeling task in the art history domain, based on the results of four experiments where we varied the number of labels annotators could assign, the number of annotators, the type and amount of training they received, and the size of the text span being labeled. Using the labelings from one experiment involving seven annotators, we investigate the relation between interannotator agreement and machine learning performance. We construct binary classifiers and vary the training and test data by swapping the labelings from the seven annotators. First, we find performance is often quite good despite lower than recommended interannotator agreement. Second, we find that on average, learning performance for a given functional semantic category correlates with the overall agreement among the seven annotators for that category. Third, we find that learning performance on the data from a given annotator does not correlate with the quality of that annotator’s labeling. We offer recommendations for the use of labeled data in machine learning, and argue that learners should attempt to accommodate human variation. We also note implications for large scale corpus annotation projects that deal with similarly subjective phenomena.