DiaSafety-CC: Annotating Dialogues with Safety Labels and Reasons for Cross-Cultural Analysis

Tunde Oluwaseyi Ajayi, Mihael Arcan, Paul Buitelaar


Abstract
42 A dialogue dataset developed in a language can have diverse safety annotations when presented to raters from different cultures. What is considered acceptable in one culture can be perceived as offensive in another culture. Cultural differences in dialogue safety annotation is yet to be fully explored. In this work, we use the geopolitical entity, Country, as our base for cultural study. We extend DiaSafety, an existing English dialogue safety dataset that was originally annotated by raters from Western culture, to create a new dataset, DiaSafety-CC. In our work, three raters each from Nigeria and India reannotate the DiaSafety dataset and provide reasons for their choice of labels. We perform pairwise comparisons of the annotations across the cultures studied. Furthermore, we compare the representative labels of each rater group to that of an existing large language model (LLM). Due to the subjectivity of the dialogue annotation task, 32.6% of the considered dialogues achieve unanimous annotation consensus across the labels of DiaSafety and the six raters. In our analyses, we observe that the Unauthorized Expertise and Biased Opinion categories have dialogues with the highest label disagreement ratio across the cultures studied. On manual inspection of the reasons provided for the choice of labels, we observe that raters across the cultures in DiaSafety-CC are sensitive to dialogues directed at target groups compared to dialogues directed at individuals. We also observe that GPT-4o annotation shows a more positive agreement with DiaSafety labels in terms of F1 score and phi coefficient.
Anthology ID:
2025.ldk-1.1
Volume:
Proceedings of the 5th Conference on Language, Data and Knowledge
Month:
September
Year:
2025
Address:
Naples, Italy
Editors:
Mehwish Alam, Andon Tchechmedjiev, Jorge Gracia, Dagmar Gromann, Maria Pia di Buono, Johanna Monti, Maxim Ionov
Venues:
LDK | WS
SIG:
Publisher:
Unior Press
Note:
Pages:
1–12
Language:
URL:
https://preview.aclanthology.org/ldl-25-ingestion/2025.ldk-1.1/
DOI:
Bibkey:
Cite (ACL):
Tunde Oluwaseyi Ajayi, Mihael Arcan, and Paul Buitelaar. 2025. DiaSafety-CC: Annotating Dialogues with Safety Labels and Reasons for Cross-Cultural Analysis. In Proceedings of the 5th Conference on Language, Data and Knowledge, pages 1–12, Naples, Italy. Unior Press.
Cite (Informal):
DiaSafety-CC: Annotating Dialogues with Safety Labels and Reasons for Cross-Cultural Analysis (Ajayi et al., LDK 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ldl-25-ingestion/2025.ldk-1.1.pdf