Tyler Baldwin


2024

pdf
Self-Regulated Data-Free Knowledge Amalgamation for Text Classification
Prashanth Vijayaraghavan | Hongzhi Wang | Luyao Shi | Tyler Baldwin | David Beymer | Ehsan Degan
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)

Recently, there has been a growing availability of pre-trained text models on various model repositories. These models greatly reduce the cost of training new models from scratch as they can be fine-tuned for specific tasks or trained on large datasets. However, these datasets may not be publicly accessible due to the privacy, security, or intellectual property issues. In this paper, we aim to develop a lightweight student network that can learn from multiple teacher models without accessing their original training data. Hence, we investigate Data-Free Knowledge Amalgamation (DFKA), a knowledge-transfer task that combines insights from multiple pre-trained teacher models and transfers them effectively to a compact student network. To accomplish this, we propose STRATANET, a modeling framework comprising: (a) a steerable data generator that produces text data tailored to each teacher and (b) an amalgamation module that implements a self-regulative strategy using confidence estimates from the teachers’ different layers to selectively integrate their knowledge and train a versatile student. We evaluate our method on three benchmark text classification datasets with varying labels or domains. Empirically, we demonstrate that the student model learned using our STRATANET outperforms several baselines significantly under data-driven and data-free constraints.

2022

pdf
Improving Neural Models for Radiology Report Retrieval with Lexicon-based Automated Annotation
Luyao Shi | Tanveer Syeda-mahmood | Tyler Baldwin
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Many clinical informatics tasks that are based on electronic health records (EHR) need relevant patient cohorts to be selected based on findings, symptoms and diseases. Frequently, these conditions are described in radiology reports which can be retrieved using information retrieval (IR) methods. The latest of these techniques utilize neural IR models such as BERT trained on clinical text. However, these methods still lack semantic understanding of the underlying clinical conditions as well as ruled out findings, resulting in poor precision during retrieval. In this paper we combine clinical finding detection with supervised query match learning. Specifically, we use lexicon-driven concept detection to detect relevant findings in sentences. These findings are used as queries to train a Sentence-BERT (SBERT) model using triplet loss on matched and unmatched query-sentence pairs. We show that the proposed supervised training task remarkably improves the retrieval performance of SBERT. The trained model generalizes well to unseen queries and reports from different collections.

2021

pdf
BLAR: Biomedical Local Acronym Resolver
William Hogan | Yoshiki Vazquez Baeza | Yannis Katsis | Tyler Baldwin | Ho-Cheol Kim | Chun-Nan Hsu
Proceedings of the 20th Workshop on Biomedical Language Processing

NLP has emerged as an essential tool to extract knowledge from the exponentially increasing volumes of biomedical texts. Many NLP tasks, such as named entity recognition and named entity normalization, are especially challenging in the biomedical domain partly because of the prolific use of acronyms. Long names for diseases, bacteria, and chemicals are often replaced by acronyms. We propose Biomedical Local Acronym Resolver (BLAR), a high-performing acronym resolver that leverages state-of-the-art (SOTA) pre-trained language models to accurately resolve local acronyms in biomedical texts. We test BLAR on the Ab3P corpus and achieve state-of-the-art results compared to the current best-performing local acronym resolution algorithms and models.

2015

pdf
An In-depth Analysis of the Effect of Text Normalization in Social Media
Tyler Baldwin | Yunyao Li
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2013

pdf
Adaptive Parser-Centric Text Normalization
Congle Zhang | Tyler Baldwin | Howard Ho | Benny Kimelfeld | Yunyao Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Automatic Term Ambiguity Detection
Tyler Baldwin | Yunyao Li | Bogdan Alexe | Ioana R. Stanoi
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf
Autonomous Self-Assessment of Autocorrections: Exploring Text Message Dialogues
Tyler Baldwin | Joyce Chai
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf
Beyond Normalization: Pragmatics of Word Form in Text Messages
Tyler Baldwin | Joyce Chai
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf
Hand Gestures in Disambiguating Types of You Expressions in Multiparty Meetings
Tyler Baldwin | Joyce Chai | Katrin Kirchhoff
Proceedings of the SIGDIAL 2010 Conference

2006

pdf
Towards Conversational QA: Automatic Identification of Problematic Situations and User Intent
Joyce Y. Chai | Chen Zhang | Tyler Baldwin
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions