TSD: Towards Computational Processing of Tamil Similes - A Tamil Simile Dataset
Aathavan Nithiyananthan, Jathushan Raveendra, Uthayasanker Thayasivam
Abstract
A simile is a powerful figure of speech that makes a comparison between two different things via shared properties, often using words like “like” or “as” to create vivid imagery, convey emotions, and enhance understanding. However, computational research on similes is limited in low-resource languages like Tamil due to the lack of simile datasets. This work introduces a manually annotated Tamil Simile Dataset (TSD) comprising around 1.5k simile sentences drawn from various sources. Our data annotation guidelines ensure that all the simile sentences are annotated with the three components, namely tenor, vehicle, and context. We benchmark our dataset for simile interpretation and simile generation tasks using chosen pre-trained language models (PLMs) and present the results. Our findings highlight the challenges of simile tasks in Tamil, suggesting areas for further improvement. We believe that TSD will drive progress in computational simile processing for Tamil and other low-resource languages, further advancing simile related tasks in Natural Language Processing.- Anthology ID:
- 2025.dravidianlangtech-1.99
- Volume:
- Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
- Month:
- May
- Year:
- 2025
- Address:
- Acoma, The Albuquerque Convention Center, Albuquerque, New Mexico
- Editors:
- Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Elizabeth Sherly, Saranya Rajiakodi, Balasubramanian Palani, Malliga Subramanian, Subalalitha Cn, Dhivya Chinnappa
- Venues:
- DravidianLangTech | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 573–579
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.dravidianlangtech-1.99/
- DOI:
- Cite (ACL):
- Aathavan Nithiyananthan, Jathushan Raveendra, and Uthayasanker Thayasivam. 2025. TSD: Towards Computational Processing of Tamil Similes - A Tamil Simile Dataset. In Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 573–579, Acoma, The Albuquerque Convention Center, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- TSD: Towards Computational Processing of Tamil Similes - A Tamil Simile Dataset (Nithiyananthan et al., DravidianLangTech 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.dravidianlangtech-1.99.pdf