This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
SangeethaSivanesan
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Question-answering (QA) systems play a pivotal role in natural language processing (NLP), powering applications such as search engines and virtual assistants by providing accurate responses to user queries. However, building effective QA systems for Dravidian languages, like Tamil, poses distinct challenges due to the scarcity of resources and the linguistic complexities inherent to these languages. This paper introduces a novel method to enhance QA accuracy by integrating answer-type features alongside traditional question and context inputs. We fine-tuned both mono- and multilingual pre-trained models on the Extended Chaii dataset, which comprises Tamil translations from the SQuAD dataset, as well as on the SQuAD-EAT-5000 dataset, consisting of English-language instances. Our experiments reveal that incorporating answer-type features significantly improves model performance compared to using only question and context inputs. Specifically, for the Extended Chaii dataset, the MuRIL model achieved the highest F1 score of 53.89, surpassing other pre-trained models, while RoBERTa outperformed BERT on the SQuAD-EAT-5000 dataset with a score of 82.07. This research advances QA systems for Dravidian languages and underscores the importance of integrating linguistic features for improved accuracy.
News classification allows analysts and researchers to study trends over time. Based on classification, news platforms can provide readers with related articles. Many digital news platforms and apps use classification to offer personalized content for their users. While there are numerous resources accessible for news classification in various Indian languages, there is still a lack of extensive benchmark dataset specifically for the Telugu language. Our paper presents and describes the Telugu20news group dataset, where news has been collected from various online Telugu news channels. We describe in detail the accumulation and annotation of the proposed news headlines dataset. In addition, we conducted extensive experiments on our proposed news headlines dataset in order to deliver solid baselines for future work.
Offensive content moderation is vital in social media platforms to support healthy online discussions. However, their prevalence in code-mixed Dravidian languages is limited to classifying whole comments without identifying part of it contributing to offensiveness. Such limitation is primarily due to the lack of annotated data for offensive spans. Accordingly, in this shared task, we provide Tamil-English code-mixed social comments with offensive spans. This paper outlines the dataset so released, methods, and results of the submitted systems.
We present our findings from the first shared task on Multi-task Learning in Dravidian Languages at the second Workshop on Speech and Language Technologies for Dravidian Languages. In this task, a sentence in any of three Dravidian Languages is required to be classified into two closely related tasks namely Sentiment Analyis (SA) and Offensive Language Identification (OLI). The task spans over three Dravidian Languages, namely, Kannada, Malayalam, and Tamil. It is one of the first shared tasks that focuses on Multi-task Learning for closely related tasks, especially for a very low-resourced language family such as the Dravidian language family. In total, 55 people signed up to participate in the task, and due to the intricate nature of the task, especially in its first iteration, 3 submissions have been received.
Thirumurai, also known as Panniru Thirumurai, is a collection of Tamil Shaivite poems dating back to the Hindu revival period between the 6th and the 10th century. These poems are par excellence, in both literary and musical terms. They have been composed based on the ancient, now non-existent Tamil Pann system and can be set to music. We present a large dataset containing all the Thirumurai poems and also attempt to classify the Pann and author of each poem using transformer based architectures. Our work is the first of its kind in dealing with ancient Tamil text datasets, which are severely under-resourced. We explore several Deep Learning-based techniques for solving this challenge effectively and provide essential insights into the problem and how to address it.