PG-Story: Taxonomy, Dataset, and Evaluation for Ensuring Child-Safe Content for Story Generation
Alicia Y. Tsai, Shereen Oraby, Anjali Narayan-Chen, Alessandra Cervone, Spandana Gella, Apurv Verma, Tagyoung Chung, Jing Huang, Nanyun Peng
Abstract
Creating children’s stories through text generation is a creative task that requires stories to be both entertaining and suitable for young audiences. However, since current story generation systems often rely on pre-trained language models fine-tuned with limited story data, they may not always prioritize child-friendliness. This can lead to the unintended generation of stories containing problematic elements such as violence, profanity, and biases. Regrettably, despite the significance of these concerns, there is a lack of clear guidelines and benchmark datasets for ensuring content safety for children. In this paper, we introduce a taxonomy specifically tailored to assess content safety in text, with a strong emphasis on children’s well-being. We present PG-Story, a dataset that includes detailed annotations for both sentence-level and discourse-level safety. We demonstrate the potential of identifying unsafe content through self-diagnosis and employing controllable generation techniques during the decoding phase to minimize unsafe elements in generated stories.- Anthology ID:
- 2024.nlp4pi-1.7
- Volume:
- Proceedings of the Third Workshop on NLP for Positive Impact
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Daryna Dementieva, Oana Ignat, Zhijing Jin, Rada Mihalcea, Giorgio Piatti, Joel Tetreault, Steven Wilson, Jieyu Zhao
- Venues:
- NLP4PI | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 78–97
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.nlp4pi-1.7/
- DOI:
- 10.18653/v1/2024.nlp4pi-1.7
- Cite (ACL):
- Alicia Y. Tsai, Shereen Oraby, Anjali Narayan-Chen, Alessandra Cervone, Spandana Gella, Apurv Verma, Tagyoung Chung, Jing Huang, and Nanyun Peng. 2024. PG-Story: Taxonomy, Dataset, and Evaluation for Ensuring Child-Safe Content for Story Generation. In Proceedings of the Third Workshop on NLP for Positive Impact, pages 78–97, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- PG-Story: Taxonomy, Dataset, and Evaluation for Ensuring Child-Safe Content for Story Generation (Tsai et al., NLP4PI 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.nlp4pi-1.7.pdf