Aravind Krishnan

2021

Employing Wikipedia as a resource for Named Entity Recognition in Morphologically complex under-resourced languages
Aravind Krishnan | Stefan Ziehe | Franziska Pannach | Caroline Sporleder
Proceedings of the 14th Workshop on Building and Using Comparable Corpora (BUCC 2021)

We propose a novel approach for rapid prototyping of named entity recognisers through the development of semi-automatically annotated datasets. We demonstrate the proposed pipeline on two under-resourced agglutinating languages: the Dravidian language Malayalam and the Bantu language isiZulu. Our approach is weakly supervised and bootstraps training data from Wikipedia and Google Knowledge Graph. Moreover, our approach is relatively language independent and can consequently be ported quickly (and hence cost-effectively) from one language to another, requiring only minor language-specific tailoring.

pdf bib abs

GCDH@LT-EDI-EACL2021: XLM-RoBERTa for Hope Speech Detection in English, Malayalam, and Tamil
Stefan Ziehe | Franziska Pannach | Aravind Krishnan
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

This paper describes approaches to identify Hope Speech in short, informal texts in English, Malayalam and Tamil using different machine learning techniques. We demonstrate that even very simple baseline algorithms perform reasonably well on this task if provided with enough training data. However, our best performing algorithm is a cross-lingual transfer learning approach in which we fine-tune XLM-RoBERTa.

Co-authors

Venues

bucc1
ltedi1

Fix author