From Form to Meaning: Interlingua Sense-Alignment of Offensive Language with LLMs

Maria Alexandra Roussopoulou, Stella Markantonatou


Abstract
This paper presents a methodology that uses LLMs to align multilingual offensive lexicons at the sense level. Lexicons of different structures and origins in Arabic, Bulgarian, Modern Greek, French, and Italian have been aligned directly without pivoting through English. The Modern Greek lexicon is LLM-generated, and the other four lexicons are WordNet-compatible. For inter-language alignment of senses, an LLM-as-a-judge rubric was used over lemma–definition–example triples. The LLM makes 2.87M pairwise comparisons and yields 31 strict global-sense categories. The paper discusses the challenges involved in sense alignment tasks. The resource is available to support downstream applications such as Machine Translation and cross-lingual hate-speech detection.
Anthology ID:
2026.ltedi-1.6
Volume:
Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:
July
Year:
2026
Address:
Virtual (Online)
Editors:
Bharathi Raja Chakravarthi, Bharathi B, Paul Buitelaar, Durairaj Thenmozhi, Miguel Ángel García Cumbreras, Salud María Jiménez Zafra
Venues:
LTEDI | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
63–75
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.ltedi-1.6/
DOI:
Bibkey:
Cite (ACL):
Maria Alexandra Roussopoulou and Stella Markantonatou. 2026. From Form to Meaning: Interlingua Sense-Alignment of Offensive Language with LLMs. In Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 63–75, Virtual (Online). Association for Computational Linguistics.
Cite (Informal):
From Form to Meaning: Interlingua Sense-Alignment of Offensive Language with LLMs (Roussopoulou & Markantonatou, LTEDI 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.ltedi-1.6.pdf