Sara Renjit


2021

pdf
CUSATNLP@DravidianLangTech-EACL2021:Language Agnostic Classification of Offensive Content in Tweets
Sara Renjit | Sumam Mary Idicula
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Identifying offensive information from tweets is a vital language processing task. This task concentrated more on English and other foreign languages these days. In this shared task on Offensive Language Identification in Dravidian Languages, in the First Workshop of Speech and Language Technologies for Dravidian Languages in EACL 2021, the aim is to identify offensive content from code mixed Dravidian Languages Kannada, Malayalam, and Tamil. Our team used language agnostic BERT (Bidirectional Encoder Representation from Transformers) for sentence embedding and a Softmax classifier. The language-agnostic representation based classification helped obtain good performance for all the three languages, out of which results for the Malayalam language are good enough to obtain a third position among the participating teams.

pdf
Siamese Networks for Inference in Malayalam Language Texts
Sara Renjit | Sumam Mary Idicula
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Natural language inference is a method of finding inferences in language texts. Understanding the meaning of a sentence and its inference is essential in many language processing applications. In this context, we consider the inference problem for a Dravidian language, Malayalam. Siamese networks train the text hypothesis pairs with word embeddings and language agnostic embeddings, and the results are evaluated against classification metrics for binary classification into entailment and contradiction classes. XLM-R embeddings based Siamese architecture using gated recurrent units and bidirectional long short term memory networks provide promising results for this classification problem.