TamilATIS: Dataset for Task-Oriented Dialog in Tamil

Ramaneswaran S, Sanchit Vijay, Kathiravan Srinivasan


Abstract
Task-Oriented Dialogue (TOD) systems allow users to accomplish tasks by giving directions to the system using natural language utterances. With the widespread adoption of conversational agents and chat platforms, TOD has become mainstream in NLP research today. However, developing TOD systems require massive amounts of data, and there has been limited work done for TOD in low-resource languages like Tamil. Towards this objective, we introduce TamilATIS - a TOD dataset for Tamil which contains 4874 utterances. We present a detailed account of the entire data collection and data annotation process. We train state-of-the-art NLU models and report their performances. The joint BERT model with XLM-Roberta as utterance encoder achieved the highest score with an intent accuracy of 96.26% and slot F1 of 94.01%.
Anthology ID:
2022.dravidianlangtech-1.4
Volume:
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25–32
Language:
URL:
https://aclanthology.org/2022.dravidianlangtech-1.4
DOI:
10.18653/v1/2022.dravidianlangtech-1.4
Bibkey:
Cite (ACL):
Ramaneswaran S, Sanchit Vijay, and Kathiravan Srinivasan. 2022. TamilATIS: Dataset for Task-Oriented Dialog in Tamil. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pages 25–32, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
TamilATIS: Dataset for Task-Oriented Dialog in Tamil (S et al., DravidianLangTech 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.dravidianlangtech-1.4.pdf
Data
ATIS