EmoTa: A Tamil Emotional Speech Dataset
Jubeerathan Thevakumar, Luxshan Thavarasa, Thanikan Sivatheepan, Sajeev Kugarajah, Uthayasanker Thayasivam
Abstract
This paper introduces EmoTa, the first emotional speech dataset in Tamil, designed to reflect the linguistic diversity of Sri Lankan Tamil speakers. EmoTa comprises 936 recorded utterances from 22 native Tamil speakers (11 male, 11 female), each articulating 19 semantically neutral sentences across five primary emotions: anger, happiness, sadness, fear, and neutrality. To ensure quality, inter-annotator agreement was assessed using Fleiss’ Kappa, resulting in a substantial agreement score of 0.74. Initial evaluations using machine learning models, including XGBoost and Random Forest, yielded a high F1-score of 0.91 and 0.90 for emotion classification tasks. By releasing EmoTa, we aim to encourage further exploration of Tamil language processing and the development of innovative models for Tamil Speech Emotion Recognition.- Anthology ID:
- 2025.chipsal-1.19
- Volume:
- Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)
- Month:
- January
- Year:
- 2025
- Address:
- Abu Dhabi, UAE
- Editors:
- Kengatharaiyer Sarveswaran, Ashwini Vaidya, Bal Krishna Bal, Sana Shams, Surendrabikram Thapa
- Venues:
- CHiPSAL | WS
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 193–201
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.chipsal-1.19/
- DOI:
- Cite (ACL):
- Jubeerathan Thevakumar, Luxshan Thavarasa, Thanikan Sivatheepan, Sajeev Kugarajah, and Uthayasanker Thayasivam. 2025. EmoTa: A Tamil Emotional Speech Dataset. In Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025), pages 193–201, Abu Dhabi, UAE. International Committee on Computational Linguistics.
- Cite (Informal):
- EmoTa: A Tamil Emotional Speech Dataset (Thevakumar et al., CHiPSAL 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.chipsal-1.19.pdf