2024
pdf
abs
MICo: Preventative Detoxification of Large Language Models through Inhibition Control
Roy Siegelmann
|
Ninareh Mehrabi
|
Palash Goyal
|
Prasoon Goyal
|
Lisa Bauer
|
Jwala Dhamala
|
Aram Galstyan
|
Rahul Gupta
|
Reza Ghanadan
Findings of the Association for Computational Linguistics: NAACL 2024
Large Language Models (LLMs) are powerful tools which have been both dominant and commonplace in the field of Artificial Intelligence. Yet, LLMs have a tendency to devolve into toxic degeneration, wherein otherwise safe and unproblematic models begin generating toxic content. For the sake of social responsibility and inspired by the biological mechanisms of inhibition control, we introduce the paradigm of Education for Societal Norms (ESN). By collecting and labeling examples as acceptable and unacceptable (in this case toxic and non-toxic), and including a corresponding acceptable rewrite with every unacceptable example, we introduce a new mechanism for LLM detoxification. We annotate a dataset of 2,850 entries and use it to fine-tune a model, which we call a Model with Inhibition Control (MICo). Evaluating this model on toxicity detection capability, rewrite detoxification, meaning preservation, and overall toxicity reduction, we discover significant improvements over the baseline model. In our experiments we show that overall toxicity of this model is more than 60% reduced, with over 75% reduction in severe toxicity.
pdf
abs
Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies
Anaelia Ovalle
|
Ninareh Mehrabi
|
Palash Goyal
|
Jwala Dhamala
|
Kai-Wei Chang
|
Richard Zemel
|
Aram Galstyan
|
Yuval Pinter
|
Rahul Gupta
Findings of the Association for Computational Linguistics: NAACL 2024
Gender-inclusive NLP research has documented the harmful limitations of gender binary-centric large language models (LLM), such as the inability to correctly use gender-diverse English neopronouns (e.g., xe, zir, fae). While data scarcity is a known culprit, the precise mechanisms through which scarcity affects this behavior remain underexplored. We discover LLM misgendering is significantly influenced by Byte-Pair Encoding (BPE) tokenization, the tokenizer powering many popular LLMs. Unlike binary pronouns, BPE overfragments neopronouns, a direct consequence of data scarcity during tokenizer training. This disparate tokenization mirrors tokenizer limitations observed in multilingual and low-resource NLP, unlocking new misgendering mitigation strategies. We propose two techniques: (1) pronoun tokenization parity, a method to enforce consistent tokenization across gendered pronouns, and (2) utilizing pre-existing LLM pronoun knowledge to improve neopronoun proficiency. Our proposed methods outperform finetuning with standard BPE, improving neopronoun accuracy from 14.1% to 58.4%. Our paper is the first to link LLM misgendering to tokenization and deficient neopronoun grammar, indicating that LLMs unable to correctly treat neopronouns as pronouns are more prone to misgender.
pdf
abs
Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs
Elan Markowitz
|
Anil Ramakrishna
|
Jwala Dhamala
|
Ninareh Mehrabi
|
Charith Peris
|
Rahul Gupta
|
Kai-Wei Chang
|
Aram Galstyan
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Knowledge graphs (KGs) complement Large Language Models (LLMs) by providing reliable, structured, domain-specific, and up-to-date external knowledge. However, KGs and LLMs are often developed separately and must be integrated after training. We introduce Tree-of-Traversals, a novel zero-shot reasoning algorithm that enables augmentation of black-box LLMs with one or more KGs. The algorithm equips a LLM with actions for interfacing a KG and enables the LLM to perform tree search over possible thoughts and actions to find high confidence reasoning paths. Tree-of-Traversals significantly improves performance on question answering and KG question answering tasks. Code is available at https://github.com/amazon-science/tree-of-traversals
pdf
bib
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)
Anaelia Ovalle
|
Kai-Wei Chang
|
Yang Trista Cao
|
Ninareh Mehrabi
|
Jieyu Zhao
|
Aram Galstyan
|
Jwala Dhamala
|
Anoop Kumar
|
Rahul Gupta
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)
2023
pdf
bib
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
Anaelia Ovalle
|
Kai-Wei Chang
|
Ninareh Mehrabi
|
Yada Pruksachatkun
|
Aram Galystan
|
Jwala Dhamala
|
Apurv Verma
|
Trista Cao
|
Anoop Kumar
|
Rahul Gupta
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
pdf
abs
Multi-VALUE: A Framework for Cross-Dialectal English NLP
Caleb Ziems
|
William Held
|
Jingfeng Yang
|
Jwala Dhamala
|
Rahul Gupta
|
Diyi Yang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Dialect differences caused by regional, social, and economic factors cause performance discrepancies for many groups of language technology users. Inclusive and equitable language technology must critically be dialect invariant, meaning that performance remains constant over dialectal shifts. Current systems often fall short of this ideal since they are designed and tested on a single dialect: Standard American English (SAE). We introduce a suite of resources for evaluating and achieving English dialect invariance. The resource is called Multi-VALUE, a controllable rule-based translation system spanning 50 English dialects and 189 unique linguistic features. Multi-VALUE maps SAE to synthetic forms of each dialect. First, we use this system to stress tests question answering, machine translation, and semantic parsing. Stress tests reveal significant performance disparities for leading models on non-standard dialects. Second, we use this system as a data augmentation technique to improve the dialect robustness of existing systems. Finally, we partner with native speakers of Chicano and Indian English to release new gold-standard variants of the popular CoQA task. To execute the transformation code, run model checkpoints, and download both synthetic and gold-standard dialectal benchmark datasets, see
http://value-nlp.org.
pdf
abs
Resolving Ambiguities in Text-to-Image Generative Models
Ninareh Mehrabi
|
Palash Goyal
|
Apurv Verma
|
Jwala Dhamala
|
Varun Kumar
|
Qian Hu
|
Kai-Wei Chang
|
Richard Zemel
|
Aram Galstyan
|
Rahul Gupta
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Natural language often contains ambiguities that can lead to misinterpretation and miscommunication. While humans can handle ambiguities effectively by asking clarifying questions and/or relying on contextual cues and common-sense knowledge, resolving ambiguities can be notoriously hard for machines. In this work, we study ambiguities that arise in text-to-image generative models. We curate the Text-to-image Ambiguity Benchmark (TAB) dataset to study different types of ambiguities in text-to-image generative models. We then propose the Text-to-ImagE Disambiguation (TIED) framework to disambiguate the prompts given to the text-to-image generative models by soliciting clarifications from the end user. Through automatic and human evaluations, we show the effectiveness of our framework in generating more faithful images aligned with end user intention in the presence of ambiguities.
2022
pdf
abs
Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal
Umang Gupta
|
Jwala Dhamala
|
Varun Kumar
|
Apurv Verma
|
Yada Pruksachatkun
|
Satyapriya Krishna
|
Rahul Gupta
|
Kai-Wei Chang
|
Greg Ver Steeg
|
Aram Galstyan
Findings of the Association for Computational Linguistics: ACL 2022
Language models excel at generating coherent text, and model compression techniques such as knowledge distillation have enabled their use in resource-constrained settings. However, these models can be biased in multiple ways, including the unfounded association of male and female genders with gender-neutral professions. Therefore, knowledge distillation without any fairness constraints may preserve or exaggerate the teacher model’s biases onto the distilled model. To this end, we present a novel approach to mitigate gender disparity in text generation by learning a fair model during knowledge distillation. We propose two modifications to the base knowledge distillation based on counterfactual role reversal—modifying teacher probabilities and augmenting the training set. We evaluate gender polarity across professions in open-ended text generated from the resulting distilled and finetuned GPT–2 models and demonstrate a substantial reduction in gender disparity with only a minor compromise in utility. Finally, we observe that language models that reduce gender polarity in language generation do not improve embedding fairness or downstream classification fairness.
pdf
abs
Measuring Fairness of Text Classifiers via Prediction Sensitivity
Satyapriya Krishna
|
Rahul Gupta
|
Apurv Verma
|
Jwala Dhamala
|
Yada Pruksachatkun
|
Kai-Wei Chang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is lack of consensus on which metrics most accurately reflect the fairness of a system. In this work, we propose a new formulation – accumulated prediction sensitivity, which measures fairness in machine learning models based on the model’s prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness. It also correlates well with humans’ perception of fairness. We conduct experiments on two text classification datasets – Jigsaw Toxicity, and Bias in Bios, and evaluate the correlations between metrics and manual annotations on whether the model produced a fair outcome. We observe that the proposed fairness metric based on prediction sensitivity is statistically significantly more correlated with human annotation than the existing counterfactual fairness metric.
pdf
abs
On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations
Yang Trista Cao
|
Yada Pruksachatkun
|
Kai-Wei Chang
|
Rahul Gupta
|
Varun Kumar
|
Jwala Dhamala
|
Aram Galstyan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Multiple metrics have been introduced to measure fairness in various natural language processing tasks. These metrics can be roughly categorized into two categories: 1) extrinsic metrics for evaluating fairness in downstream applications and 2) intrinsic metrics for estimating fairness in upstream contextualized language representation models. In this paper, we conduct an extensive correlation study between intrinsic and extrinsic metrics across bias notions using 19 contextualized language models. We find that intrinsic and extrinsic metrics do not necessarily correlate in their original setting, even when correcting for metric misalignments, noise in evaluation datasets, and confounding factors such as experiment configuration for extrinsic metrics.
pdf
bib
Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022)
Apurv Verma
|
Yada Pruksachatkun
|
Kai-Wei Chang
|
Aram Galstyan
|
Jwala Dhamala
|
Yang Trista Cao
Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022)
2021
pdf
bib
Proceedings of the First Workshop on Trustworthy Natural Language Processing
Yada Pruksachatkun
|
Anil Ramakrishna
|
Kai-Wei Chang
|
Satyapriya Krishna
|
Jwala Dhamala
|
Tanaya Guha
|
Xiang Ren
Proceedings of the First Workshop on Trustworthy Natural Language Processing
pdf
Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification
Yada Pruksachatkun
|
Satyapriya Krishna
|
Jwala Dhamala
|
Rahul Gupta
|
Kai-Wei Chang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
2020
pdf
abs
Evaluating the Effectiveness of Efficient Neural Architecture Search for Sentence-Pair Tasks
Ansel MacLaughlin
|
Jwala Dhamala
|
Anoop Kumar
|
Sriram Venkatapathy
|
Ragav Venkatesan
|
Rahul Gupta
Proceedings of the First Workshop on Insights from Negative Results in NLP
Neural Architecture Search (NAS) methods, which automatically learn entire neural model or individual neural cell architectures, have recently achieved competitive or state-of-the-art (SOTA) performance on variety of natural language processing and computer vision tasks, including language modeling, natural language inference, and image classification. In this work, we explore the applicability of a SOTA NAS algorithm, Efficient Neural Architecture Search (ENAS) (Pham et al., 2018) to two sentence pair tasks, paraphrase detection and semantic textual similarity. We use ENAS to perform a micro-level search and learn a task-optimized RNN cell architecture as a drop-in replacement for an LSTM. We explore the effectiveness of ENAS through experiments on three datasets (MRPC, SICK, STS-B), with two different models (ESIM, BiLSTM-Max), and two sets of embeddings (Glove, BERT). In contrast to prior work applying ENAS to NLP tasks, our results are mixed – we find that ENAS architectures sometimes, but not always, outperform LSTMs and perform similarly to random architecture search.