Abstract
In this paper, we present a new Tamil lyrics corpus extracted from Tamil movies captured across a range of 65 years (1954 to 2019). We present a detailed corpus analysis showing the nature of Tamil lyrics with respect to lyricists and the year which it was written. We also present similar- ity score across different lyricists based on their song lyrics. We present experi- mental results based on the SOTA BERT Tamil models to identify the lyricists of a song. Finally, we present future research directions encouraging researchers to pur- sue Tamil NLP research.- Anthology ID:
- 2021.dravidianlangtech-1.1
- Volume:
- Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
- Month:
- April
- Year:
- 2021
- Address:
- Kyiv
- Venue:
- DravidianLangTech
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1–9
- Language:
- URL:
- https://aclanthology.org/2021.dravidianlangtech-1.1
- DOI:
- Cite (ACL):
- Dhivya Chinnappa and Praveenraj Dhandapani. 2021. Tamil Lyrics Corpus: Analysis and Experiments. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 1–9, Kyiv. Association for Computational Linguistics.
- Cite (Informal):
- Tamil Lyrics Corpus: Analysis and Experiments (Chinnappa & Dhandapani, DravidianLangTech 2021)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/2021.dravidianlangtech-1.1.pdf
- Code
- praveenraj0904/tamillyricscorpus