Mitigating Societal Harms in Large Language Models

Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios Anastasopoulos, Yulia Tsvetkov


Abstract
Numerous recent studies have highlighted societal harms that can be caused by language technologies deployed in the wild. While several surveys, tutorials, and workshops have discussed the risks of harms in specific contexts – e.g., detecting and mitigating gender bias in NLP models – no prior work has developed a unified typology of technical approaches for mitigating harms of language generation models. Our tutorial is based on a survey we recently wrote that proposes such a typology. We will provide an overview of potential social issues in language generation, including toxicity, social biases, misinformation, factual inconsistency, and privacy violations. Our primary focus will be on how to systematically identify risks, and how eliminate them at various stages of model development, from data collection, to model development, to inference/language generation. Through this tutorial, we aim to equip NLP researchers and engineers with a suite of practical tools for mitigating safety risks from pretrained language generation models.
Anthology ID:
2023.emnlp-tutorial.5
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Month:
December
Year:
2023
Address:
Singapore
Editors:
Qi Zhang, Hassan Sajjad
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26–33
Language:
URL:
https://aclanthology.org/2023.emnlp-tutorial.5
DOI:
10.18653/v1/2023.emnlp-tutorial.5
Bibkey:
Cite (ACL):
Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios Anastasopoulos, and Yulia Tsvetkov. 2023. Mitigating Societal Harms in Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, pages 26–33, Singapore. Association for Computational Linguistics.
Cite (Informal):
Mitigating Societal Harms in Large Language Models (Kumar et al., EMNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2023.emnlp-tutorial.5.pdf