Grzegorz Kondrak


2021

pdf bib
On Universal Colexifications
Hongchang Bao | Bradley Hauer | Grzegorz Kondrak
Proceedings of the 11th Global Wordnet Conference

Colexification occurs when two distinct concepts are lexified by the same word. The term covers both polysemy and homonymy. We posit and investigate the hypothesis that no pair of concepts are colexified in every language. We test our hypothesis by analyzing colexification data from BabelNet, Open Multilingual WordNet, and CLICS. The results show that our hypothesis is supported by over 99.9% of colexified concept pairs in these three lexical resources.

pdf bib
Homonymy and Polysemy Detection with Multilingual Information
Amir Ahmad Habibi | Bradley Hauer | Grzegorz Kondrak
Proceedings of the 11th Global Wordnet Conference

Deciding whether a semantically ambiguous word is homonymous or polysemous is equivalent to establishing whether it has any pair of senses that are semantically unrelated. We present novel methods for this task that leverage information from multilingual lexical resources. We formally prove the theoretical properties that provide the foundation for our methods. In particular, we show how the One Homonym Per Translation hypothesis of Hauer and Kondrak (2020a) follows from the synset properties formulated by Hauer and Kondrak (2020b). Experimental evaluation shows that our approach sets a new state of the art for homonymy detection.

pdf bib
Semi-Supervised and Unsupervised Sense Annotation via Translations
Bradley Hauer | Grzegorz Kondrak | Yixing Luan | Arnob Mallik | Lili Mou
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Acquisition of multilingual training data continues to be a challenge in word sense disambiguation (WSD). To address this problem, unsupervised approaches have been proposed to automatically generate sense annotations for training supervised WSD systems. We present three new methods for creating sense-annotated corpora which leverage translations, parallel bitexts, lexical resources, as well as contextual and synset embeddings. Our semi-supervised method applies machine translation to transfer existing sense annotations to other languages. Our two unsupervised methods refine sense annotations produced by a knowledge-based WSD system via lexical translations in a parallel corpus. We obtain state-of-the-art results on standard WSD benchmarks.

pdf bib
UAlberta at SemEval-2021 Task 2: Determining Sense Synonymy via Translations
Bradley Hauer | Hongchang Bao | Arnob Mallik | Grzegorz Kondrak
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

We describe the University of Alberta systems for the SemEval-2021 Word-in-Context (WiC) disambiguation task. We explore the use of translation information for deciding whether two different tokens of the same word correspond to the same sense of the word. Our focus is on developing principled theoretical approaches which are grounded in linguistic phenomena, leading to more explainable models. We show that translations from multiple languages can be leveraged to improve the accuracy on the WiC task.

2020

pdf bib
Improving Word Sense Disambiguation with Translations
Yixing Luan | Bradley Hauer | Lili Mou | Grzegorz Kondrak
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

It has been conjectured that multilingual information can help monolingual word sense disambiguation (WSD). However, existing WSD systems rarely consider multilingual information, and no effective method has been proposed for improving WSD by generating translations. In this paper, we present a novel approach that improves the performance of a base WSD system using machine translation. Since our approach is language independent, we perform WSD experiments on several languages. The results demonstrate that our methods can consistently improve the performance of WSD systems, and obtain state-ofthe-art results in both English and multilingual WSD. To facilitate the use of lexical translation information, we also propose BABALIGN, an precise bitext alignment algorithm which is guided by multilingual lexical correspondences from BabelNet.

pdf bib
Low-Resource G2P and P2G Conversion with Synthetic Training Data
Bradley Hauer | Amir Ahmad Habibi | Yixing Luan | Arnob Mallik | Grzegorz Kondrak
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper presents the University of Alberta systems and results in the SIGMORPHON 2020 Task 1: Multilingual Grapheme-to-Phoneme Conversion. Following previous SIGMORPHON shared tasks, we define a low-resource setting with 100 training instances. We experiment with three transduction approaches in both standard and low-resource settings, as well as on the related task of phoneme-to-grapheme conversion. We propose a method for synthesizing training data using a combination of diverse models.

pdf bib
UAlberta at SemEval-2020 Task 2: Using Translations to Predict Cross-Lingual Entailment
Bradley Hauer | Amir Ahmad Habibi | Yixing Luan | Arnob Mallik | Grzegorz Kondrak
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We investigate the hypothesis that translations can be used to identify cross-lingual lexical entailment. We propose novel methods that leverage parallel corpora, word embeddings, and multilingual lexical resources. Our results demonstrate that the implementation of these ideas leads to improvements in predicting entailment.

2019

pdf bib
Joint Approach to Deromanization of Code-mixed Texts
Rashed Rubby Riyadh | Grzegorz Kondrak
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects

The conversion of romanized texts back to the native scripts is a challenging task because of the inconsistent romanization conventions and non-standard language use. This problem is compounded by code-mixing, i.e., using words from more than one language within the same discourse. In this paper, we propose a novel approach for handling these two problems together in a single system. Our approach combines three components: language identification, back-transliteration, and sequence prediction. The results of our experiments on Bengali and Hindi datasets establish the state of the art for the task of deromanization of code-mixed texts.

pdf bib
Cognate Projection for Low-Resource Inflection Generation
Bradley Hauer | Amir Ahmad Habibi | Yixing Luan | Rashed Rubby Riyadh | Grzegorz Kondrak
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

We propose cognate projection as a method of crosslingual transfer for inflection generation in the context of the SIGMORPHON 2019 Shared Task. The results on four language pairs show the method is effective when no low-resource training data is available.

2018

pdf bib
Comparison of Assorted Models for Transliteration
Saeed Najafi | Bradley Hauer | Rashed Rubby Riyadh | Leyuan Yu | Grzegorz Kondrak
Proceedings of the Seventh Named Entities Workshop

We report the results of our experiments in the context of the NEWS 2018 Shared Task on Transliteration. We focus on the comparison of several diverse systems, including three neural MT models. A combination of discriminative, generative, and neural models obtains the best results on the development sets. We also put forward ideas for improving the shared task.

pdf bib
String Transduction with Target Language Models and Insertion Handling
Garrett Nicolai | Saeed Najafi | Grzegorz Kondrak
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

Many character-level tasks can be framed as sequence-to-sequence transduction, where the target is a word from a natural language. We show that leveraging target language models derived from unannotated target corpora, combined with a precise alignment of the training data, yields state-of-the art results on cognate projection, inflection generation, and phoneme-to-grapheme conversion.

pdf bib
Combining Neural and Non-Neural Methods for Low-Resource Morphological Reinflection
Saeed Najafi | Bradley Hauer | Rashed Rubby Riyadh | Leyuan Yu | Grzegorz Kondrak
Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

2017

pdf bib
Identifying Cognate Sets Across Dictionaries of Related Languages
Adam St Arnaud | David Beck | Grzegorz Kondrak
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a system for identifying cognate sets across dictionaries of related languages. The likelihood of a cognate relationship is calculated on the basis of a rich set of features that capture both phonetic and semantic similarity, as well as the presence of regular sound correspondences. The similarity scores are used to cluster words from different languages that may originate from a common proto-word. When tested on the Algonquian language family, our system detects 63% of cognate sets while maintaining cluster purity of 70%.

pdf bib
If you can’t beat them, join them: the University of Alberta system description
Garrett Nicolai | Bradley Hauer | Mohammad Motallebi | Saeed Najafi | Grzegorz Kondrak
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection

pdf bib
Morphological Analysis without Expert Annotation
Garrett Nicolai | Grzegorz Kondrak
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

The task of morphological analysis is to produce a complete list of lemma+tag analyses for a given word-form. We propose a discriminative string transduction approach which exploits plain inflection tables and raw text corpora, thus obviating the need for expert annotation. Experiments on four languages demonstrate that our system has much higher coverage than a hand-engineered FST analyzer, and is more accurate than a state-of-the-art morphological tagger.

pdf bib
Bootstrapping Unsupervised Bilingual Lexicon Induction
Bradley Hauer | Garrett Nicolai | Grzegorz Kondrak
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

The task of unsupervised lexicon induction is to find translation pairs across monolingual corpora. We develop a novel method that creates seed lexicons by identifying cognates in the vocabularies of related languages on the basis of their frequency and lexical similarity. We apply bidirectional bootstrapping to a method which learns a linear mapping between context-based vector spaces. Experimental results on three language pairs show consistent improvement over prior work.

2016

pdf bib
Integrating Morphological Desegmentation into Phrase-based Decoding
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Morphological Reinflection via Discriminative String Transduction
Garrett Nicolai | Bradley Hauer | Adam St Arnaud | Grzegorz Kondrak
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

pdf bib
Morphological Segmentation Can Improve Syllabification
Garrett Nicolai | Lei Yao | Grzegorz Kondrak
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

pdf bib
Leveraging Inflection Tables for Stemming and Lemmatization.
Garrett Nicolai | Grzegorz Kondrak
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Decoding Anagrammed Texts Written in an Unknown Language and Script
Bradley Hauer | Grzegorz Kondrak
Transactions of the Association for Computational Linguistics, Volume 4

Algorithmic decipherment is a prime example of a truly unsupervised problem. The first step in the decipherment process is the identification of the encrypted language. We propose three methods for determining the source language of a document enciphered with a monoalphabetic substitution cipher. The best method achieves 97% accuracy on 380 languages. We then present an approach to decoding anagrammed substitution ciphers, in which the letters within words have been arbitrarily transposed. It obtains the average decryption word accuracy of 93% on a set of 50 ciphertexts in 5 languages. Finally, we report the results on the Voynich manuscript, an unsolved fifteenth century cipher, which suggest Hebrew as the language of the document.

2015

pdf bib
What Matters Most in Morphologically Segmented SMT Models?
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Morpho-syntactic Regularities in Continuous Word Representations: A multilingual study.
Garrett Nicolai | Colin Cherry | Grzegorz Kondrak
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

pdf bib
Multiple System Combination for Transliteration
Garrett Nicolai | Bradley Hauer | Mohammad Salameh | Adam St Arnaud | Ying Xu | Lei Yao | Grzegorz Kondrak
Proceedings of the Fifth Named Entity Workshop

pdf bib
A Lexicalized Tree Kernel for Open Information Extraction
Ying Xu | Christoph Ringlstetter | Mi-Young Kim | Grzegorz Kondrak | Randy Goebel | Yusuke Miyao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
English orthography is not “close to optimal”
Garrett Nicolai | Grzegorz Kondrak
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Inflection Generation as Discriminative String Transduction
Garrett Nicolai | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Joint Generation of Transliterations from Multiple Representations
Lei Yao | Grzegorz Kondrak
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
10 Open Questions in Computational Morphonology
Grzegorz Kondrak
Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM

pdf bib
Lattice Desegmentation for Statistical Machine Translation
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Does the Phonology of L1 Show Up in L2 Texts?
Garrett Nicolai | Grzegorz Kondrak
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Solving Substitution Ciphers with Combined Language Models
Bradley Hauer | Ryan Hayward | Grzegorz Kondrak
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Automatic Generation of English Respellings
Bradley Hauer | Grzegorz Kondrak
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Reversing Morphological Tokenization in English-to-Arabic SMT
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2013 NAACL HLT Student Research Workshop

pdf bib
Cognate and Misspelling Features for Natural Language Identification
Garrett Nicolai | Bradley Hauer | Mohammad Salameh | Lei Yao | Grzegorz Kondrak
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Identification of Speakers in Novels
Hua He | Denilson Barbosa | Grzegorz Kondrak
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Leveraging supplemental representations for sequential transduction
Aditya Bhargava | Grzegorz Kondrak
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Similarity Patterns in Words (Invited talk)
Grzegorz Kondrak
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

pdf bib
Transliteration Experiments on Chinese and Arabic
Grzegorz Kondrak | Xingkai Li | Mohammad Salameh
Proceedings of the 4th Named Entity Workshop (NEWS) 2012

2011

pdf bib
How do you pronounce your name? Improving G2P with transliterations
Aditya Bhargava | Grzegorz Kondrak
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
The application of chordal graphs to inferring phylogenetic trees of languages
Jessica Enright | Grzegorz Kondrak
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Clustering Semantically Equivalent Words into Cognate Sets in Multilingual Lists
Bradley Hauer | Grzegorz Kondrak
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Leveraging Transliterations from Multiple Languages
Aditya Bhargava | Bradley Hauer | Grzegorz Kondrak
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

2010

pdf bib
Predicting the Semantic Compositionality of Prefix Verbs
Shane Bergsma | Aditya Bhargava | Hua He | Grzegorz Kondrak
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Language identification of names with SVMs
Aditya Bhargava | Grzegorz Kondrak
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Integrating Joint n-gram Features into a Discriminative Training Framework
Sittichai Jiampojamarn | Colin Cherry | Grzegorz Kondrak
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Letter-Phoneme Alignment: An Exploration
Sittichai Jiampojamarn | Grzegorz Kondrak
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Transliteration Generation and Mining with Limited Training Resources
Sittichai Jiampojamarn | Kenneth Dwyer | Shane Bergsma | Aditya Bhargava | Qing Dou | Mi-Young Kim | Grzegorz Kondrak
Proceedings of the 2010 Named Entities Workshop

pdf bib
Application of the Tightness Continuum Measure to Chinese Information Retrieval
Ying Xu | Randy Goebel | Christoph Ringlstetter | Grzegorz Kondrak
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

2009

pdf bib
On the Syllabification of Phonemes
Susan Bartlett | Grzegorz Kondrak | Colin Cherry
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Multiple Word Alignment with Profile Hidden Markov Models
Aditya Bhargava | Grzegorz Kondrak
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium

pdf bib
A Ranking Approach to Stress Prediction for Letter-to-Phoneme Conversion
Qing Dou | Shane Bergsma | Sittichai Jiampojamarn | Grzegorz Kondrak
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Reducing the Annotation Effort for Letter-to-Phoneme Conversion
Kenneth Dwyer | Grzegorz Kondrak
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
DirecTL: a Language Independent Approach to Transliteration
Sittichai Jiampojamarn | Aditya Bhargava | Qing Dou | Kenneth Dwyer | Grzegorz Kondrak
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

2008

pdf bib
Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion
Susan Bartlett | Grzegorz Kondrak | Colin Cherry
Proceedings of ACL-08: HLT

pdf bib
Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion
Sittichai Jiampojamarn | Colin Cherry | Grzegorz Kondrak
Proceedings of ACL-08: HLT

2007

pdf bib
Bootstrapping a Stochastic Transducer for Arabic-English Transliteration Extraction
Tarek Sherif | Grzegorz Kondrak
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Substring-Based Transliteration
Tarek Sherif | Grzegorz Kondrak
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Alignment-Based Discriminative String Similarity
Shane Bergsma | Grzegorz Kondrak
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion
Sittichai Jiampojamarn | Grzegorz Kondrak | Tarek Sherif
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
A Fast Method for Parallel Document Identification
Jessica Enright | Grzegorz Kondrak
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf bib
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology
John Nerbonne | T. Mark Ellison | Grzegorz Kondrak
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology

pdf bib
Computing and Historical Phonology
John Nerbonne | T. Mark Ellison | Grzegorz Kondrak
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology

pdf bib
Creating a Comparative Dictionary of Totonac-Tepehua
Grzegorz Kondrak | David Beck | Philip Dilts
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology

2006

pdf bib
Evaluation of Several Phonetic Similarity Algorithms on the Task of Cognate Identification
Grzegorz Kondrak | Tarek Sherif
Proceedings of the Workshop on Linguistic Distances

pdf bib
Biomedical Term Recognition with the Perceptron HMM Algorithm
Sittichai Jiampojamarn | Grzegorz Kondrak | Colin Cherry
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

2005

pdf bib
Computing Word Similarity and Identifying Cognates with Pair Hidden Markov Models
Wesley Mackay | Grzegorz Kondrak
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

pdf bib
Cognates and Word Alignment in Bitexts
Grzegorz Kondrak
Proceedings of Machine Translation Summit X: Papers

We evaluate several orthographic word similarity measures in the context of bitext word alignment. We investigate the relationship between the length of the words and the length of their longest common subsequence. We present an alternative to the longest common subsequence ratio (LCSR), a widely-used orthographic word similarity measure. Experiments involving identification of cognates in bitexts suggest that the alternative method outperforms LCSR. Our results also indicate that alignment links can be used as a substitute for cognates for the purpose of evaluating word similarity measures.

pdf bib
Learning a Spelling Error Model from Search Query Logs
Farooq Ahmad | Grzegorz Kondrak
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Identification of Confusable Drug Names: A New Approach and Evaluation Methodology
Grzegorz Kondrak | Bonnie Dorr
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Cognates Can Improve Statistical Translation Models
Grzegorz Kondrak | Daniel Marcu | Kevin Knight
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

2002

pdf bib
Determining Recurrent Sound Correspondences by Inducing Translation Models
Grzegorz Kondrak
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Book Reviews: The Significance of Word Lists
Grzegorz Kondrak
Computational Linguistics, Volume 27, Number 4, December 2001

pdf bib
Identifying Cognates by Phonetic and Semantic Similarity
Grzegorz Kondrak
Second Meeting of the North American Chapter of the Association for Computational Linguistics

2000

pdf bib
A New Algorithm for the Alignment of Phonetic Sequences
Grzegorz Kondrak
1st Meeting of the North American Chapter of the Association for Computational Linguistics