Arnob Mallik


2023

pdf
Correcting Sense Annotations Using Wordnets and Translations
Arnob Mallik | Grzegorz Kondrak
Proceedings of the 12th Global Wordnet Conference

Acquiring large amounts of high-quality annotated data is an open issue in word sense disambiguation. This problem has become more critical recently with the advent of supervised models based on neural networks, which require large amounts of annotated data. We propose two algorithms for making selective corrections on a sense-annotated parallel corpus, based on cross-lingual synset mappings. We show that, when applied to bilingual parallel corpora, these algorithms can rectify noisy sense annotations, and thereby produce multilingual sense-annotated data of improved quality.

2021

pdf
UAlberta at SemEval-2021 Task 2: Determining Sense Synonymy via Translations
Bradley Hauer | Hongchang Bao | Arnob Mallik | Grzegorz Kondrak
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

We describe the University of Alberta systems for the SemEval-2021 Word-in-Context (WiC) disambiguation task. We explore the use of translation information for deciding whether two different tokens of the same word correspond to the same sense of the word. Our focus is on developing principled theoretical approaches which are grounded in linguistic phenomena, leading to more explainable models. We show that translations from multiple languages can be leveraged to improve the accuracy on the WiC task.

pdf
Semi-Supervised and Unsupervised Sense Annotation via Translations
Bradley Hauer | Grzegorz Kondrak | Yixing Luan | Arnob Mallik | Lili Mou
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Acquisition of multilingual training data continues to be a challenge in word sense disambiguation (WSD). To address this problem, unsupervised approaches have been proposed to automatically generate sense annotations for training supervised WSD systems. We present three new methods for creating sense-annotated corpora which leverage translations, parallel bitexts, lexical resources, as well as contextual and synset embeddings. Our semi-supervised method applies machine translation to transfer existing sense annotations to other languages. Our two unsupervised methods refine sense annotations produced by a knowledge-based WSD system via lexical translations in a parallel corpus. We obtain state-of-the-art results on standard WSD benchmarks.

2020

pdf
Low-Resource G2P and P2G Conversion with Synthetic Training Data
Bradley Hauer | Amir Ahmad Habibi | Yixing Luan | Arnob Mallik | Grzegorz Kondrak
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper presents the University of Alberta systems and results in the SIGMORPHON 2020 Task 1: Multilingual Grapheme-to-Phoneme Conversion. Following previous SIGMORPHON shared tasks, we define a low-resource setting with 100 training instances. We experiment with three transduction approaches in both standard and low-resource settings, as well as on the related task of phoneme-to-grapheme conversion. We propose a method for synthesizing training data using a combination of diverse models.

pdf
UAlberta at SemEval-2020 Task 2: Using Translations to Predict Cross-Lingual Entailment
Bradley Hauer | Amir Ahmad Habibi | Yixing Luan | Arnob Mallik | Grzegorz Kondrak
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We investigate the hypothesis that translations can be used to identify cross-lingual lexical entailment. We propose novel methods that leverage parallel corpora, word embeddings, and multilingual lexical resources. Our results demonstrate that the implementation of these ideas leads to improvements in predicting entailment.