James R. Foulds
2025
GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models
Tao Zhang
|
Ziqian Zeng
|
YuxiangXiao YuxiangXiao
|
Huiping Zhuang
|
Cen Chen
|
James R. Foulds
|
Shimei Pan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) are prone to generating content that exhibits gender biases, raising significant ethical concerns. Alignment, the process of fine-tuning LLMs to better align with desired behaviors, is recognized as an effective approach to mitigate gender biases. Although proprietary LLMs have made significant strides in mitigating gender bias, their alignment datasets are not publicly available. The commonly used and publicly available alignment dataset, HH-RLHF, still exhibits gender bias to some extent. There is a lack of publicly available alignment datasets specifically designed to address gender bias. Hence, we developed a new dataset named GenderAlign, aiming at mitigating a comprehensive set of gender biases in LLMs. This dataset comprises 8k single-turn dialogues, each paired with a “chosen” and a “rejected” response. Compared to the “rejected” responses, the “chosen” responses demonstrate lower levels of gender bias and higher quality. Furthermore, we categorized the gender biases in the “rejected” responses of GenderAlign into 4 principal categories. The experimental results show the effectiveness of GenderAlign in reducing gender bias in LLMs.
2022
Neural Embedding Allocation: Distributed Representations of Topic Models
Kamrun Naher Keya
|
Yannis Papanikolaou
|
James R. Foulds
Computational Linguistics, Volume 48, Issue 4 - December 2022
We propose a method that uses neural embeddings to improve the performance of any given LDA-style topic model. Our method, called neural embedding allocation (NEA), deconstructs topic models (LDA or otherwise) into interpretable vector-space embeddings of words, topics, documents, authors, and so on, by learning neural embeddings to mimic the topic model. We demonstrate that NEA improves coherence scores of the original topic model by smoothing out the noisy topics when the number of topics is large. Furthermore, we show NEA’s effectiveness and generality in deconstructing and smoothing LDA, author-topic models, and the recent mixed membership skip-gram topic model and achieve better performance with the embeddings compared to several state-of-the-art models.
Search
Fix author
Co-authors
- Cen Chen 1
- Kamrun Naher Keya 1
- Shimei Pan 1
- Yannis Papanikolaou 1
- YuxiangXiao YuxiangXiao 1
- show all...