MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation
Mehul Agarwal, Aditya Aggarwal, Arnav Goel, Medha Hira, Anubha Gupta
Abstract
While multilingual large language models (LLMs) perform well on high-level tasks like translation and question answering, their ability to handle grammatical gender and morphological agreement remains underexplored. In morphologically rich languages, gender influences verb conjugation, pronouns, and even first-person constructions with explicit and implicit mentions of gender. We introduce MORPHOGEN, a morphologically grounded large-scale benchmark dataset for evaluating gender-aware generation in three typologically diverse grammatically gendered languages: French, Arabic, and Hindi. The core task, GENFORM, requires models to rewrite a first-person sentence in the opposite gender while preserving its meaning and structure. We construct a high-quality synthetic dataset spanning these three languages and benchmark 15 popular multilingual LLMs (2B–70B) on their ability to perform this transformation. Our results reveal significant gaps and interesting insights into how current models handle morphological gender. MORPHOGEN provides a focused diagnostic lens for gender-aware language modeling and lays the groundwork for future research on inclusive and morphology-sensitive NLP.- Anthology ID:
- 2026.acl-long.105
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2289–2313
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.105/
- DOI:
- Cite (ACL):
- Mehul Agarwal, Aditya Aggarwal, Arnav Goel, Medha Hira, and Anubha Gupta. 2026. MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2289–2313, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation (Agarwal et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.105.pdf