Kristen Howell

2022

We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.

pdf abs
Behind the Mask: Demographic bias in name detection for PII masking
Courtney Mansfield | Amandalynne Paullada | Kristen Howell
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Many datasets contain personally identifiable information, or PII, which poses privacy risks to individuals. PII masking is commonly used to redact personal information such as names, addresses, and phone numbers from text data. Most modern PII masking pipelines involve machine learning algorithms. However, these systems may vary in performance, such that individuals from particular demographic groups bear a higher risk for having their personal information exposed. In this paper, we evaluate the performance of three off-the-shelf PII masking systems on name detection and redaction. We generate data using names and templates from the customer service domain. We find that an open-source RoBERTa-based system shows fewer disparities than the commercial models we test. However, all systems demonstrate significant differences in error rate based on demographics. In particular, the highest error rates occurred for names associated with Black and Asian/Pacific Islander individuals.

2021

pdf abs
Should Semantic Vector Composition be Explicit? Can it be Linear?
Dominic Widdows | Kristen Howell | Trevor Cohen
Proceedings of the 2021 Workshop on Semantic Spaces at the Intersection of NLP, Physics, and Cognitive Science (SemSpace)

Vector representations have become a central element in semantic language modelling, leading to mathematical overlaps with many fields including quantum theory. Compositionality is a core goal for such representations: given representations for ‘wet’ and ‘fish’, how should the concept ‘wet fish’ be represented? This position paper surveys this question from two points of view. The first considers the question of whether an explicit mathematical representation can be successful using only tools from within linear algebra, or whether other mathematical tools are needed. The second considers whether semantic vector composition should be explicitly described mathematically, or whether it can be a model-internal side-effect of training a neural network. A third and newer question is whether a compositional model can be implemented on a quantum computer. Given the fundamentally linear nature of quantum mechanics, we propose that these questions are related, and that this survey may help to highlight candidate operations for future quantum implementation.

2019

pdf
Modeling Clausal Complementation for a Grammar Engineering Resource
Olga Zamaraeva | Kristen Howell | Emily M. Bender
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

pdf
Handling cross-cutting properties in automatic inference of lexical classes: A case study of Chintang
Olga Zamaraeva | Kristen Howell | Emily M. Bender
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)

2018

pdf abs
Clausal Modifiers in the Grammar Matrix
Kristen Howell | Olga Zamaraeva
Proceedings of the 27th International Conference on Computational Linguistics

We extend the coverage of an existing grammar customization system to clausal modifiers, also referred to as adverbial clauses. We present an analysis, taking a typologically-driven approach to account for this phenomenon across the world’s languages, which we implement in the Grammar Matrix customization system (Bender et al., 2002, 2010). Testing our analysis on testsuites from five genetically and geographically diverse languages that were not considered in development, we achieve 88.4% coverage and 1.5% overgeneration.

pdf abs
Improving Feature Extraction for Pathology Reports with Precise Negation Scope Detection
Olga Zamaraeva | Kristen Howell | Adam Rhine
Proceedings of the 27th International Conference on Computational Linguistics

We use a broad coverage, linguistically precise English Resource Grammar (ERG) to detect negation scope in sentences taken from pathology reports. We show that incorporating this information in feature extraction has a positive effect on classification of the reports with respect to cancer laterality compared with NegEx, a commonly used tool for negation detection. We analyze the differences between NegEx and ERG results on our dataset and how these differences indicate some directions for future work.

2017

pdf
Inferring Case Systems from IGT: Enriching the Enrichment
Kristen Howell | Emily M. Bender | Michel Lockwood | Fei Xia | Olga Zamaraeva
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf
Computational Support for Finding Word Classes: A Case Study of Abui
Olga Zamaraeva | František Kratochvíl | Emily M. Bender | Fei Xia | Kristen Howell
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages

Co-authors

Venues

ltedi1