Chinmay Choudhary

2022

pdf abs
Cross-lingual Semantic Role Labelling with the ValPaL Database Knowledge
Chinmay Choudhary | Colm O’Riordan
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

Cross-lingual Transfer Learning typically involves training a model on a high-resource sourcelanguage and applying it to a low-resource tar-get language. In this work we introduce a lexi-cal database calledValency Patterns Leipzig(ValPal)which provides the argument patterninformation about various verb-forms in mul-tiple languages including low-resource langua-ges. We also provide a framework to integratethe ValPal database knowledge into the state-of-the-art LSTM based model for cross-lingualsemantic role labelling. Experimental resultsshow that integrating such knowledge resultedin am improvement in performance of the mo-del on all the target languages on which it isevaluated.

2021

pdf abs
End-to-end mBERT based Seq2seq Enhanced Dependency Parser with Linguistic Typology knowledge
Chinmay Choudhary | Colm O’riordan
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)

We describe the NUIG solution for IWPT 2021 Shared Task of Enhanced Dependency (ED) parsing in multiple languages. For this shared task, we propose and evaluate an End-to-end Seq2seq mBERT-based ED parser which predicts the ED-parse tree of a given input sentence as a relative head-position tag-sequence. Our proposed model is a multitasking neural-network which performs five key tasks simultaneously namely UPOS tagging, UFeat tagging, Lemmatization, Dependency-parsing and ED-parsing. Furthermore we utilise the linguistic typology available in the WALS database to improve the ability of our proposed end-to-end parser to transfer across languages. Results show that our proposed Seq2seq ED-parser performs on par with state-of-the-art ED-parser despite having a much simpler de- sign.

pdf abs
Improving the Performance of UDify with Linguistic Typology Knowledge
Chinmay Choudhary
Proceedings of the Third Workshop on Computational Typology and Multilingual NLP

UDify is the state-of-the-art language-agnostic dependency parser which is trained on a polyglot corpus of 75 languages. This multilingual modeling enables the model to generalize over unknown/lesser-known languages, thus leading to improved performance on low-resource languages. In this work we used linguistic typology knowledge available in URIEL database, to improve the cross-lingual transferring ability of UDify even further.

pdf abs
Universal Recurrent Neural Network Grammar
Chinmay Choudhary | Colm O’riordan
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

Modern approaches to Constituency Parsing are mono-lingual supervised approaches which require large amount of labelled data to be trained on, thus limiting their utility to only a handful of high-resource languages. To address this issue of data-sparsity for low-resource languages we propose Universal Recurrent Neural Network Grammars (UniRNNG) which is a multi-lingual variant of the popular Recurrent Neural Network Grammars (RNNG) model for constituency parsing. UniRNNG involves Cross-lingual Transfer Learning for Constituency Parsing task. The architecture of UniRNNG is inspired by Principle and Parameter theory proposed by Noam Chomsky. UniRNNG utilises the linguistic typology knowledge available as feature-values within WALS database, to generalize over multiple languages. Once trained on sufficiently diverse polyglot corpus UniRNNG can be applied to any natural language thus making it Language-agnostic constituency parser. Experiments reveal that our proposed UniRNNG outperform state-of-the-art baseline approaches for most of the target languages, for which these are tested.

2020

pdf abs
NUIG: Multitasking Self-attention based approach to SigTyp 2020 Shared Task
Chinmay Choudhary
Proceedings of the Second Workshop on Computational Research in Linguistic Typology

The paper describes the Multitasking Self-attention based approach to constrained sub-task within Sigtyp 2020 Shared task. Our model is simple neural network based architecture inspired by Transformers (CITATION) model. The model uses Multitasking to compute values of all WALS features for a given input language simultaneously.Results show that our approach performs at par with the baseline approaches, even though our proposed approach requires only phylogenetic and geographical attributes namely Longitude, Latitude, Genus-index, Family-index and Country-index and do not use any of the known WALS features of the respective input language, to compute its missing WALS features.

Co-authors

Colm O’Riordan 3