Bridget McInnes

Also published as: Bridget T. McInnes, Bridget Thomson McInnes


NLP@VCU: Identifying Adverse Effects in English Tweets for Unbalanced Data
Darshini Mahendran | Cora Lewis | Bridget McInnes
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

This paper describes our participation in the Social Media Mining for Health Application (SMM4H 2020) Challenge Track 2 for identifying tweets containing Adverse Effects (AEs). Our system uses Convolutional Neural Networks. We explore downsampling, oversampling, and adjusting the class weights to account for the imbalanced nature of the dataset. Our results showed downsampling outperformed oversampling and adjusting the class weights on the test set however all three obtained similar results on the development set.


NLP Whack-A-Mole: Challenges in Cross-Domain Temporal Expression Extraction
Amy Olex | Luke Maffey | Bridget McInnes
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Incorporating domain knowledge is vital in building successful natural language processing (NLP) applications. Many times, cross-domain application of a tool results in poor performance as the tool does not account for domain-specific attributes. The clinical domain is challenging in this aspect due to specialized medical terms and nomenclature, shorthand notation, fragmented text, and a variety of writing styles used by different medical units. Temporal resolution is an NLP task that, in general, is domain-agnostic because temporal information is represented using a limited lexicon. However, domain-specific aspects of temporal resolution are present in clinical texts. Here we explore parsing issues that arose when running our system, a tool built on Newswire text, on clinical notes in the THYME corpus. Many parsing issues were straightforward to correct; however, a few code changes resulted in a cascading series of parsing errors that had to be resolved before an improvement in performance was observed, revealing the complexity temporal resolution and rule-based parsing. Our system now outperforms current state-of-the-art systems on the THYME corpus with little change in its performance on Newswire texts.


Chrono at SemEval-2018 Task 6: A System for Normalizing Temporal Expressions
Amy Olex | Luke Maffey | Nicholas Morgan | Bridget McInnes
Proceedings of the 12th International Workshop on Semantic Evaluation

Temporal information extraction is a challenging task. Here we describe Chrono, a hybrid rule-based and machine learning system that identifies temporal expressions in text and normalizes them into the SCATE schema. After minor parsing logic adjustments, Chrono has emerged as the top performing system for SemEval 2018 Task 6: Parsing Time Normalizations.

SciREL at SemEval-2018 Task 7: A System for Semantic Relation Extraction and Classification
Darshini Mahendran | Chathurika Brahmana | Bridget McInnes
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes our system, SciREL (Scientific abstract RELation extraction system), developed for the SemEval 2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers. We present a feature-vector based system to extract explicit semantic relation and classify them. Our system is trained in the ACL corpus (BIrd et al., 2008) that contains annotated abstracts given by the task organizers. When an abstract with annotated entities is given as the input into our system, it extracts the semantic relations through a set of defined features and classifies them into one of the given six categories of relations through feature engineering and a learned model. For the best combination of features, our system SciREL obtained an F-measure of 20.03 on the official test corpus which includes 150 abstracts in the relation classification Subtask 1.1. In this paper, we provide an in-depth error analysis of our results to prevent duplication of research efforts in the development of future systems


Improving Correlation with Human Judgments by Integrating Semantic Similarity with Second–Order Vectors
Bridget McInnes | Ted Pedersen
BioNLP 2017

Vector space methods that measure semantic similarity and relatedness often rely on distributional information such as co–occurrence frequencies or statistical measures of association to weight the importance of particular co–occurrences. In this paper, we extend these methods by incorporating a measure of semantic similarity based on a human curated taxonomy into a second–order vector representation. This results in a measure of semantic relatedness that combines both the contextual information available in a corpus–based vector space representation with the semantic knowledge found in a biomedical ontology. Our results show that incorporating semantic similarity into a second order co-occurrence matrices improves correlation with human judgments for both similarity and relatedness, and that our method compares favorably to various different word embedding methods that have recently been evaluated on the same reference standards we have used.

Evaluating Feature Extraction Methods for Knowledge-based Biomedical Word Sense Disambiguation
Sam Henry | Clint Cuffy | Bridget McInnes
BioNLP 2017

In this paper, we present an analysis of feature extraction methods via dimensionality reduction for the task of biomedical Word Sense Disambiguation (WSD). We modify the vector representations in the 2-MRD WSD algorithm, and evaluate four dimensionality reduction methods: Word Embeddings using Continuous Bag of Words and Skip Gram, Singular Value Decomposition (SVD), and Principal Component Analysis (PCA). We also evaluate the effects of vector size on the performance of each of these methods. Results are evaluated on five standard evaluation datasets (Abbrev.100, Abbrev.200, Abbrev.300, NLM-WSD, and MSH-WSD). We find that vector sizes of 100 are sufficient for all techniques except SVD, for which a vector size of 1500 is referred. We also show that SVD performs on par with Word Embeddings for all but one dataset.


VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
Gerard Briones | Kasun Amarasinghe | Bridget McInnes
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

VCU at Semeval-2016 Task 14: Evaluating definitional-based similarity measure for semantic taxonomy enrichment
Bridget McInnes
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)


UMLS::Similarity: Measuring the Relatedness and Similarity of Biomedical Concepts
Bridget McInnes | Ted Pedersen | Serguei Pakhomov | Ying Liu | Genevieve Melton-Meaux
Proceedings of the 2013 NAACL HLT Demonstration Session


Using Second-order Vectors in a Knowledge-based Method for Acronym Disambiguation
Bridget T. McInnes | Ted Pedersen | Ying Liu | Serguei V. Pakhomov | Genevieve B. Melton
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

The Ngram Statistics Package (Text::NSP) : A Flexible Tool for Identifying Ngrams, Collocations, and Word Associations
Ted Pedersen | Satanjeev Banerjee | Bridget McInnes | Saiyam Kohli | Mahesh Joshi | Ying Liu
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World


Automated Identification of Synonyms in Biomedical Acronym Sense Inventories
Genevieve B. Melton | SungRim Moon | Bridget McInnes | Serguei Pakhomov
Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents


An Unsupervised Vector Approach to Biomedical Term Disambiguation: Integrating UMLS and Medline
Bridget McInnes
Proceedings of the ACL-08: HLT Student Research Workshop


pdf bib
Determining the Syntactic Structure of Medical Terms in Clinical Notes
Bridget McInnes | Ted Pedersen | Serguei Pakhomov
Biological, translational, and clinical language processing


The Duluth Word Alignment System
Bridget Thomson McInnes | Ted Pedersen
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond