2023
pdf
bib
abs
Multilingual End-to-end Dependency Parsing with Linguistic Typology knowledge
Chinmay Choudhary
|
Colm O’riordan
Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
We evaluate a Multilingual End-to-end BERT based Dependency Parser which parses an input sentence by directly predicting the relative head-position for each word within it. Our model is a Cross-lingual dependency parser which is trained on a diverse polyglot corpus of high-resource source languages, and is applied on a low-resource target language. To make model more robust to typological variations between source and target languages, and to facilitate the cross-lingual transferring, we utilized the Linguistic typology knowledge, available in typological databases WALS and URIEL. We induce such typology knowledge within our model through an auxiliary task within Multi-task Learning framework.
2022
pdf
bib
abs
Cross-lingual Semantic Role Labelling with the ValPaL Database Knowledge
Chinmay Choudhary
|
Colm O’Riordan
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures
Cross-lingual Transfer Learning typically involves training a model on a high-resource sourcelanguage and applying it to a low-resource tar-get language. In this work we introduce a lexi-cal database calledValency Patterns Leipzig(ValPal)which provides the argument patterninformation about various verb-forms in mul-tiple languages including low-resource langua-ges. We also provide a framework to integratethe ValPal database knowledge into the state-of-the-art LSTM based model for cross-lingualsemantic role labelling. Experimental resultsshow that integrating such knowledge resultedin am improvement in performance of the mo-del on all the target languages on which it isevaluated.
2021
pdf
bib
abs
Universal Recurrent Neural Network Grammar
Chinmay Choudhary
|
Colm O’riordan
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
Modern approaches to Constituency Parsing are mono-lingual supervised approaches which require large amount of labelled data to be trained on, thus limiting their utility to only a handful of high-resource languages. To address this issue of data-sparsity for low-resource languages we propose Universal Recurrent Neural Network Grammars (UniRNNG) which is a multi-lingual variant of the popular Recurrent Neural Network Grammars (RNNG) model for constituency parsing. UniRNNG involves Cross-lingual Transfer Learning for Constituency Parsing task. The architecture of UniRNNG is inspired by Principle and Parameter theory proposed by Noam Chomsky. UniRNNG utilises the linguistic typology knowledge available as feature-values within WALS database, to generalize over multiple languages. Once trained on sufficiently diverse polyglot corpus UniRNNG can be applied to any natural language thus making it Language-agnostic constituency parser. Experiments reveal that our proposed UniRNNG outperform state-of-the-art baseline approaches for most of the target languages, for which these are tested.
pdf
abs
End-to-end mBERT based Seq2seq Enhanced Dependency Parser with Linguistic Typology knowledge
Chinmay Choudhary
|
Colm O’riordan
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)
We describe the NUIG solution for IWPT 2021 Shared Task of Enhanced Dependency (ED) parsing in multiple languages. For this shared task, we propose and evaluate an End-to-end Seq2seq mBERT-based ED parser which predicts the ED-parse tree of a given input sentence as a relative head-position tag-sequence. Our proposed model is a multitasking neural-network which performs five key tasks simultaneously namely UPOS tagging, UFeat tagging, Lemmatization, Dependency-parsing and ED-parsing. Furthermore we utilise the linguistic typology available in the WALS database to improve the ability of our proposed end-to-end parser to transfer across languages. Results show that our proposed Seq2seq ED-parser performs on par with state-of-the-art ED-parser despite having a much simpler de- sign.
pdf
abs
Improving the Performance of UDify with Linguistic Typology Knowledge
Chinmay Choudhary
Proceedings of the Third Workshop on Computational Typology and Multilingual NLP
UDify is the state-of-the-art language-agnostic dependency parser which is trained on a polyglot corpus of 75 languages. This multilingual modeling enables the model to generalize over unknown/lesser-known languages, thus leading to improved performance on low-resource languages. In this work we used linguistic typology knowledge available in URIEL database, to improve the cross-lingual transferring ability of UDify even further.
2020
pdf
abs
NUIG: Multitasking Self-attention based approach to SigTyp 2020 Shared Task
Chinmay Choudhary
Proceedings of the Second Workshop on Computational Research in Linguistic Typology
The paper describes the Multitasking Self-attention based approach to constrained sub-task within Sigtyp 2020 Shared task. Our model is simple neural network based architecture inspired by Transformers (CITATION) model. The model uses Multitasking to compute values of all WALS features for a given input language simultaneously.Results show that our approach performs at par with the baseline approaches, even though our proposed approach requires only phylogenetic and geographical attributes namely Longitude, Latitude, Genus-index, Family-index and Country-index and do not use any of the known WALS features of the respective input language, to compute its missing WALS features.