2023
pdf
abs
AbhiPaw@DravidianLangTech: Multimodal Abusive Language Detection and Sentiment Analysis
Abhinaba Bala
|
Parameswari Krishnamurthy
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Detecting abusive language in multimodal videos has become a pressing need in ensuring a safe and inclusive online environment. This paper focuses on addressing this challenge through the development of a novel approach for multimodal abusive language detection in Tamil videos and sentiment analysis for Tamil/Malayalam videos. By leveraging state-of-the-art models such as Multiscale Vision Transformers (MViT) for video analysis, OpenL3 for audio analysis, and the bert-base-multilingual-cased model for textual analysis, our proposed framework integrates visual, auditory, and textual features. Through extensive experiments and evaluations, we demonstrate the effectiveness of our model in accurately detecting abusive content and predicting sentiment categories. The limited availability of effective tools for performing these tasks in Dravidian Languages has prompted a new avenue of research in these domains.
pdf
abs
AbhiPaw@ DravidianLangTech: Abusive Comment Detection in Tamil and Telugu using Logistic Regression
Abhinaba Bala
|
Parameswari Krishnamurthy
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Abusive comments in online platforms have become a significant concern, necessitating the development of effective detection systems. However, limited work has been done in low resource languages, including Dravidian languages. This paper addresses this gap by focusing on abusive comment detection in a dataset containing Tamil, Tamil-English and Telugu-English code-mixed comments. Our methodology involves logistic regression and explores suitable embeddings to enhance the performance of the detection model. Through rigorous experimentation, we identify the most effective combination of logistic regression and embeddings. The results demonstrate the performance of our proposed model, which contributes to the development of robust abusive comment detection systems in low resource language settings. Keywords: Abusive comment detection, Dravidian languages, logistic regression, embeddings, low resource languages, code-mixed dataset.
pdf
abs
AbhiPaw@ DravidianLangTech: Fake News Detection in Dravidian Languages using Multilingual BERT
Abhinaba Bala
|
Parameswari Krishnamurthy
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
This study addresses the challenge of detecting fake news in Dravidian languages by leveraging Google’s MuRIL (Multilingual Representations for Indian Languages) model. Drawing upon previous research, we investigate the intricacies involved in identifying fake news and explore the potential of transformer-based models for linguistic analysis and contextual understanding. Through supervised learning, we fine-tune the “muril-base-cased” variant of MuRIL using a carefully curated dataset of labeled comments and posts in Dravidian languages, enabling the model to discern between original and fake news. During the inference phase, the fine-tuned MuRIL model analyzes new textual content, extracting contextual and semantic features to predict the content’s classification. We evaluate the model’s performance using standard metrics, highlighting the effectiveness of MuRIL in detecting fake news in Dravidian languages and contributing to the establishment of a safer digital ecosystem. Keywords: fake news detection, Dravidian languages, MuRIL, transformer-based models, linguistic analysis, contextual understanding.
2022
pdf
bib
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Bharathi Raja Chakravarthi
|
Ruba Priyadharshini
|
Anand Kumar Madasamy
|
Parameswari Krishnamurthy
|
Elizabeth Sherly
|
Sinnathamby Mahesan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
pdf
abs
Findings of the Shared Task on Emotion Analysis in Tamil
Anbukkarasi Sampath
|
Thenmozhi Durairaj
|
Bharathi Raja Chakravarthi
|
Ruba Priyadharshini
|
Subalalitha Cn
|
Kogilavani Shanmugavadivel
|
Sajeetha Thavareesan
|
Sathiyaraj Thangasamy
|
Parameswari Krishnamurthy
|
Adeep Hande
|
Sean Benhur
|
Kishore Ponnusamy
|
Santhiya Pandiyan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
This paper presents the overview of the shared task on emotional analysis in Tamil. The result of the shared task is presented at the workshop. This paper presents the dataset used in the shared task, task description, and the methodology used by the participants and the evaluation results of the submission. This task is organized as two Tasks. Task A is carried with 11 emotions annotated data for social media comments in Tamil and Task B is organized with 31 fine-grained emotion annotated data for social media comments in Tamil. For conducting experiments, training and development datasets were provided to the participants and results are evaluated for the unseen data. Totally we have received around 24 submissions from 13 teams. For evaluating the models, Precision, Recall, micro average metrics are used.
pdf
abs
Findings of the Shared Task on Multi-task Learning in Dravidian Languages
Bharathi Raja Chakravarthi
|
Ruba Priyadharshini
|
Subalalitha Cn
|
Sangeetha S
|
Malliga Subramanian
|
Kogilavani Shanmugavadivel
|
Parameswari Krishnamurthy
|
Adeep Hande
|
Siddhanth U Hegde
|
Roshan Nayak
|
Swetha Valli
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
We present our findings from the first shared task on Multi-task Learning in Dravidian Languages at the second Workshop on Speech and Language Technologies for Dravidian Languages. In this task, a sentence in any of three Dravidian Languages is required to be classified into two closely related tasks namely Sentiment Analyis (SA) and Offensive Language Identification (OLI). The task spans over three Dravidian Languages, namely, Kannada, Malayalam, and Tamil. It is one of the first shared tasks that focuses on Multi-task Learning for closely related tasks, especially for a very low-resourced language family such as the Dravidian language family. In total, 55 people signed up to participate in the task, and due to the intricate nature of the task, especially in its first iteration, 3 submissions have been received.
2021
pdf
bib
Proceedings of the First Workshop on Parsing and its Applications for Indian Languages
Kengatharaiyer Sarveswaran
|
Parameswari Krishnamurthy
|
Pruthwik Mishra
Proceedings of the First Workshop on Parsing and its Applications for Indian Languages
pdf
bib
abs
Parsing Subordinate Clauses in Telugu using Rule-based Dependency Parser
P Sangeetha
|
Parameswari Krishnamurthy
|
Amba Kulkarni
Proceedings of the First Workshop on Parsing and its Applications for Indian Languages
Parsing has been gaining popularity in recent years and attracted the interest of NLP researchers around the world. It is challenging when the language under study is a free-word order language that allows ellipsis like Telugu. In this paper, an attempt is made to parse subordinate clauses especially, non-finite verb clauses and relative clauses in Telugu which are highly productive and constitute a large chunk in parsing tasks. This study adopts a knowledge-driven approach to parse subordinate structures using linguistic cues as rules. Challenges faced in parsing ambiguous structures are elaborated alongside providing enhanced tags to handle them. Results are encouraging and this parser proves to be efficient for Telugu.
pdf
bib
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Bharathi Raja Chakravarthi
|
Ruba Priyadharshini
|
Anand Kumar M
|
Parameswari Krishnamurthy
|
Elizabeth Sherly
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
pdf
abs
Findings of the Shared Task on Machine Translation in Dravidian languages
Bharathi Raja Chakravarthi
|
Ruba Priyadharshini
|
Shubhanker Banerjee
|
Richard Saldanha
|
John P. McCrae
|
Anand Kumar M
|
Parameswari Krishnamurthy
|
Melvin Johnson
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
This paper presents an overview of the shared task on machine translation of Dravidian languages. We presented the shared task results at the EACL 2021 workshop on Speech and Language Technologies for Dravidian Languages. This paper describes the datasets used, the methodology used for the evaluation of participants, and the experiments’ overall results. As a part of this shared task, we organized four sub-tasks corresponding to machine translation of the following language pairs: English to Tamil, English to Malayalam, English to Telugu and Tamil to Telugu which are available at
https://competitions.codalab.org/competitions/27650. We provided the participants with training and development datasets to perform experiments, and the results were evaluated on unseen test data. In total, 46 research groups participated in the shared task and 7 experimental runs were submitted for evaluation. We used BLEU scores for assessment of the translations.
pdf
abs
IIITK@DravidianLangTech-EACL2021: Offensive Language Identification and Meme Classification in Tamil, Malayalam and Kannada
Nikhil Ghanghor
|
Parameswari Krishnamurthy
|
Sajeetha Thavareesan
|
Ruba Priyadharshini
|
Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
This paper describes the IIITK team’s submissions to the offensive language identification, and troll memes classification shared tasks for Dravidian languages at DravidianLangTech 2021 workshop@EACL 2021. Our best configuration for Tamil troll meme classification achieved 0.55 weighted average F1 score, and for offensive language identification, our system achieved weighted F1 scores of 0.75 for Tamil, 0.95 for Malayalam, and 0.71 for Kannada. Our rank on Tamil troll meme classification is 2, and offensive language identification in Tamil, Malayalam and Kannada are 3, 3 and 4 respectively.
pdf
abs
NITK-UoH: Tamil-Telugu Machine Translation Systems for the WMT21 Similar Language Translation Task
Richard Saldanha
|
Ananthanarayana V. S
|
Anand Kumar M
|
Parameswari Krishnamurthy
Proceedings of the Sixth Conference on Machine Translation
In this work, two Neural Machine Translation (NMT) systems have been developed and evaluated as part of the bidirectional Tamil-Telugu similar languages translation subtask in WMT21. The OpenNMT-py toolkit has been used to create quick prototypes of the systems, following which models have been trained on the training datasets containing the parallel corpus and finally the models have been evaluated on the dev datasets provided as part of the task. Both the systems have been trained on a DGX station with 4 -V100 GPUs. The first NMT system in this work is a Transformer based 6 layer encoder-decoder model, trained for 100000 training steps, whose configuration is similar to the one provided by OpenNMT-py and this is used to create a model for bidirectional translation. The second NMT system contains two unidirectional translation models with the same configuration as the first system, with the addition of utilizing Byte Pair Encoding (BPE) for subword tokenization through the pre-trained MultiBPEmb model. Based on the dev dataset evaluation metrics for both the systems, the first system i.e. the vanilla Transformer model has been submitted as the Primary system. Since there were no improvements in the metrics during training of the second system with BPE, it has been submitted as a contrastive system.
pdf
Towards Building a Modern Written Tamil Treebank
Parameswari Krishnamurthy
|
Kengatharaiyer Sarveswaran
Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021)
2015
pdf
Development of Telugu-Tamil Transfer-Based Machine Translation system: With Special reference to Divergence Index
Parameswari Krishnamurthy
Proceedings of the 1st Deep Machine Translation Workshop