Emily Chen
2026
Adaptive Instruction Composition for Automated LLM Red-Teaming
Jesse Zymet | Andy Luo | Swapnil Shinde | Sahil Wadhwa | Emily Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jesse Zymet | Andy Luo | Swapnil Shinde | Sahil Wadhwa | Emily Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Many approaches to LLM red-teaming leverage an attacker LLM to discover jailbreaks against a target. Several of them task the attacker with identifying effective strategies through trial and error, resulting in a semantically limited range of successes. Another approach discovers diverse attacks by combining crowdsourced harmful queries and tactics into instructions for the attacker, but does so at random, limiting effectiveness. This article introduces a novel framework, Adaptive Instruction Composition, that combines crowdsourced texts according to an adaptive mechanism trained to jointly optimize effectiveness with diversity. We use reinforcement learning to balance exploration with exploitation in a combinatorial space of instructions to guide the attacker toward diverse generations tailored to target vulnerabilities. We demonstrate that our approach substantially outperforms random combination on a set of effectiveness and diversity metrics, even under model transfer. Further, we show that it surpasses a host of recent adaptive approaches on Harmbench. We employ a lightweight neural contextual bandit that adapts to contrastive embedding inputs, and provide ablations suggesting that the contrastive pretraining enables the network to rapidly generalize and scale to the massive space as it learns.
2025
GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection
Melissa Kazemi Rad | Alberto Purpura | Himanshu Kumar | Emily Chen | Mohammad Shahed Sorower
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Melissa Kazemi Rad | Alberto Purpura | Himanshu Kumar | Emily Chen | Mohammad Shahed Sorower
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
We address the problem of data scarcity in harmful text classification for guardrailing applications and introduce GRAID (Geometric and Reflective AI-Driven Data Augmentation), a novel pipeline that leverages Large Language Models (LLMs) for dataset augmentation. GRAID consists of two stages: (i) generation of geometrically controlled examples using a constrained LLM, and (ii) augmentation through a multi-agentic reflective process that promotes stylistic diversity and uncovers edge cases. This combination enables both reliable coverage of the input space and nuanced exploration of harmful content. Using two benchmark data sets, we demonstrate that augmenting a harmful text classification dataset with GRAID leads to significant improvements in downstream guardrail model performance.
2023
Community consultation and the development of an online Akuzipik-English dictionary
Benjamin Hunt | Lane Schwartz | Sylvia Schreiner | Emily Chen
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
Benjamin Hunt | Lane Schwartz | Sylvia Schreiner | Emily Chen
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
In this paper, we present a new online dictionary of Akuzipik, an Indigenous language of St. Lawrence Island (Alaska) and Chukotka (Russia).We discuss community desires for strengthening language use in the community and in educational settings, and present specific features of an online dictionary designed to serve these community goals.
2020
Improved Finite-State Morphological Analysis for St. Lawrence Island Yupik Using Paradigm Function Morphology
Emily Chen | Hyunji Hayley Park | Lane Schwartz
Proceedings of the Twelfth Language Resources and Evaluation Conference
Emily Chen | Hyunji Hayley Park | Lane Schwartz
Proceedings of the Twelfth Language Resources and Evaluation Conference
St. Lawrence Island Yupik is an endangered polysynthetic language of the Bering Strait region. While conducting linguistic fieldwork between 2016 and 2019, we observed substantial support within the Yupik community for language revitalization and for resource development to support Yupik education. To that end, Chen & Schwartz (2018) implemented a finite-state morphological analyzer as a critical enabling technology for use in Yupik language education and technology. Chen & Schwartz (2018) reported a morphological analysis coverage rate of approximately 75% on a dataset of 60K Yupik tokens, leaving considerable room for improvement. In this work, we present a re-implementation of the Chen & Schwartz (2018) finite-state morphological analyzer for St. Lawrence Island Yupik that incorporates new linguistic insights; in particular, in this implementation we make use of the Paradigm Function Morphology (PFM) theory of morphology. We evaluate this new PFM-based morphological analyzer, and demonstrate that it consistently outperforms the existing analyzer of Chen & Schwartz (2018) with respect to accuracy and coverage rate across multiple datasets.
2019
Measuring the Value of Linguistics: A Case Study from St. Lawrence Island Yupik
Emily Chen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Emily Chen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
The adaptation of neural approaches to NLP is a landmark achievement that has called into question the utility of linguistics in the development of computational systems. This research proposal consequently explores this question in the context of a neural morphological analyzer for a polysynthetic language, St. Lawrence Island Yupik. It asks whether incorporating elements of Yupik linguistics into the implementation of the analyzer can improve performance, both in low-resource settings and in high-resource settings, where rich quantities of data are readily available.
Bootstrapping a Neural Morphological Analyzer for St. Lawrence Island Yupik from a Finite-State Transducer
Lane Schwartz | Emily Chen | Benjamin Hunt | Sylvia L.R. Schreiner
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)
Lane Schwartz | Emily Chen | Benjamin Hunt | Sylvia L.R. Schreiner
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)
Community lexical access for an endangered polysynthetic language: An electronic dictionary for St. Lawrence Island Yupik
Benjamin Hunt | Emily Chen | Sylvia L.R. Schreiner | Lane Schwartz
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)
Benjamin Hunt | Emily Chen | Sylvia L.R. Schreiner | Lane Schwartz
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)
In this paper, we introduce a morphologically-aware electronic dictionary for St. Lawrence Island Yupik, an endangered language of the Bering Strait region. Implemented using HTML, Javascript, and CSS, the dictionary is set in an uncluttered interface and permits users to search in Yupik or in English for Yupik root words and Yupik derivational suffixes. For each matching result, our electronic dictionary presents the user with the corresponding entry from the Badten (2008) Yupik-English paper dictionary. Because Yupik is a polysynthetic language, handling of multimorphemic word forms is critical. If a user searches for an inflected Yupik word form, we perform a morphological analysis and return entries for the root word and for any derivational suffixes present in the word. This electronic dictionary should serve not only as a valuable resource for all students and speakers of Yupik, but also for field linguists working towards documentation and conservation of the language.
2018
A Morphological Analyzer for St. Lawrence Island / Central Siberian Yupik
Emily Chen | Lane Schwartz
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Emily Chen | Lane Schwartz
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)