Sam Henry


2021

pdf
MNLP at MEDIQA 2021: Fine-Tuning PEGASUS for Consumer Health Question Summarization
Jooyeon Lee | Huong Dang | Ozlem Uzuner | Sam Henry
Proceedings of the 20th Workshop on Biomedical Language Processing

This paper details a Consumer Health Question (CHQ) summarization model submitted to MEDIQA 2021 for shared task 1: Question Summarization. Many CHQs are composed of multiple sentences with typos or unnecessary information, which can interfere with automated question answering systems. Question summarization mitigates this issue by removing this unnecessary information, aiding automated systems in generating a more accurate summary. Our summarization approach focuses on applying multiple pre-processing techniques, including question focus identification on the input and the development of an ensemble method to combine question focus with an abstractive summarization method. We use the state-of-art abstractive summarization model, PEGASUS (Pre-training with Extracted Gap-sentences for Abstractive Summarization), to generate abstractive summaries. Our experiments show that using our ensemble method, which combines abstractive summarization with question focus identification, improves performance over using summarization alone. Our model shows a ROUGE-2 F-measure of 11.14% against the official test dataset.

2020

pdf
Ensemble BERT for Classifying Medication-mentioning Tweets
Huong Dang | Kahyun Lee | Sam Henry | Özlem Uzuner
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

Twitter is a valuable source of patient-generated data that has been used in various population health studies. The first step in many of these studies is to identify and capture Twitter messages (tweets) containing medication mentions. In this article, we describe our submission to Task 1 of the Social Media Mining for Health Applications (SMM4H) Shared Task 2020. This task challenged participants to detect tweets that mention medications or dietary supplements in a natural, highly imbalance dataset. Our system combined a handcrafted preprocessing step with an ensemble of 20 BERT-based classifiers generated by dividing the training dataset into subsets using 10-fold cross validation and exploiting two BERT embedding models. Our system ranked first in this task, and improved the average F1 score across all participating teams by 19.07% with a precision, recall, and F1 on the test set of 83.75%, 87.01%, and 85.35% respectively.

pdf
SalamNET at SemEval-2020 Task 12: Deep Learning Approach for Arabic Offensive Language Detection
Fatemah Husain | Jooyeon Lee | Sam Henry | Ozlem Uzuner
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes SalamNET, an Arabic offensive language detection system that has been submitted to SemEval 2020 shared task 12: Multilingual Offensive Language Identification in Social Media. Our approach focuses on applying multiple deep learning models and conducting in depth error analysis of results to provide system implications for future development considerations. To pursue our goal, a Recurrent Neural Network (RNN), a Gated Recurrent Unit (GRU), and Long-Short Term Memory (LSTM) models with different design architectures have been developed and evaluated. The SalamNET, a Bi-directional Gated Recurrent Unit (Bi-GRU) based model, reports a macro-F1 score of 0.83%

2017

pdf
Evaluating Feature Extraction Methods for Knowledge-based Biomedical Word Sense Disambiguation
Sam Henry | Clint Cuffy | Bridget McInnes
BioNLP 2017

In this paper, we present an analysis of feature extraction methods via dimensionality reduction for the task of biomedical Word Sense Disambiguation (WSD). We modify the vector representations in the 2-MRD WSD algorithm, and evaluate four dimensionality reduction methods: Word Embeddings using Continuous Bag of Words and Skip Gram, Singular Value Decomposition (SVD), and Principal Component Analysis (PCA). We also evaluate the effects of vector size on the performance of each of these methods. Results are evaluated on five standard evaluation datasets (Abbrev.100, Abbrev.200, Abbrev.300, NLM-WSD, and MSH-WSD). We find that vector sizes of 100 are sufficient for all techniques except SVD, for which a vector size of 1500 is referred. We also show that SVD performs on par with Word Embeddings for all but one dataset.

2016

pdf
VRep at SemEval-2016 Task 1 and Task 2: A System for Interpretable Semantic Similarity
Sam Henry | Allison Sands
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)