Yixing Luan


2021

pdf
Semi-Supervised and Unsupervised Sense Annotation via Translations
Bradley Hauer | Grzegorz Kondrak | Yixing Luan | Arnob Mallik | Lili Mou
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Acquisition of multilingual training data continues to be a challenge in word sense disambiguation (WSD). To address this problem, unsupervised approaches have been proposed to automatically generate sense annotations for training supervised WSD systems. We present three new methods for creating sense-annotated corpora which leverage translations, parallel bitexts, lexical resources, as well as contextual and synset embeddings. Our semi-supervised method applies machine translation to transfer existing sense annotations to other languages. Our two unsupervised methods refine sense annotations produced by a knowledge-based WSD system via lexical translations in a parallel corpus. We obtain state-of-the-art results on standard WSD benchmarks.

2020

pdf
Improving Word Sense Disambiguation with Translations
Yixing Luan | Bradley Hauer | Lili Mou | Grzegorz Kondrak
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

It has been conjectured that multilingual information can help monolingual word sense disambiguation (WSD). However, existing WSD systems rarely consider multilingual information, and no effective method has been proposed for improving WSD by generating translations. In this paper, we present a novel approach that improves the performance of a base WSD system using machine translation. Since our approach is language independent, we perform WSD experiments on several languages. The results demonstrate that our methods can consistently improve the performance of WSD systems, and obtain state-ofthe-art results in both English and multilingual WSD. To facilitate the use of lexical translation information, we also propose BABALIGN, an precise bitext alignment algorithm which is guided by multilingual lexical correspondences from BabelNet.

pdf
Low-Resource G2P and P2G Conversion with Synthetic Training Data
Bradley Hauer | Amir Ahmad Habibi | Yixing Luan | Arnob Mallik | Grzegorz Kondrak
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper presents the University of Alberta systems and results in the SIGMORPHON 2020 Task 1: Multilingual Grapheme-to-Phoneme Conversion. Following previous SIGMORPHON shared tasks, we define a low-resource setting with 100 training instances. We experiment with three transduction approaches in both standard and low-resource settings, as well as on the related task of phoneme-to-grapheme conversion. We propose a method for synthesizing training data using a combination of diverse models.

pdf
UAlberta at SemEval-2020 Task 2: Using Translations to Predict Cross-Lingual Entailment
Bradley Hauer | Amir Ahmad Habibi | Yixing Luan | Arnob Mallik | Grzegorz Kondrak
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We investigate the hypothesis that translations can be used to identify cross-lingual lexical entailment. We propose novel methods that leverage parallel corpora, word embeddings, and multilingual lexical resources. Our results demonstrate that the implementation of these ideas leads to improvements in predicting entailment.

2019

pdf bib
Cognate Projection for Low-Resource Inflection Generation
Bradley Hauer | Amir Ahmad Habibi | Yixing Luan | Rashed Rubby Riyadh | Grzegorz Kondrak
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

We propose cognate projection as a method of crosslingual transfer for inflection generation in the context of the SIGMORPHON 2019 Shared Task. The results on four language pairs show the method is effective when no low-resource training data is available.