Jonathan Washington
Also published as: Jonathan N. Washington, Jonathan North Washington
2025
Towards a Hän morphological transducer
Maura O’Leary | Joseph Lukner | Finn Verdonk | Willem de Reuse | Jonathan Washington
Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages
Maura O’Leary | Joseph Lukner | Finn Verdonk | Willem de Reuse | Jonathan Washington
Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages
This paper presents work towards a morphological transducer for Hän, a Dene language spoken in Alaska and the Yukon Territory. We present the implementation of several complex morphological features of Dene languages into a morphological transducer, an evaluation of the transducer on corpus data, and a discussion of the future uses of such a transducer towards Hän revitalization efforts.
Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)
Atul Kr. Ojha | Chao-hong Liu | Ekaterina Vylomova | Flammie Pirinen | Jonathan Washington | Nathaniel Oco | Xiaobing Zhao
Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)
Atul Kr. Ojha | Chao-hong Liu | Ekaterina Vylomova | Flammie Pirinen | Jonathan Washington | Nathaniel Oco | Xiaobing Zhao
Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)
The Kyrgyz Seed Dataset Submission to the WMT25 Open Language Data Initiative Shared Task
Murat Jumashev | Alina Tillabaeva | Aida Kasieva | Turgunbek Omurkanov | Akylai Musaeva | Meerim Emil Kyzy | Gulaiym Chagataeva | Jonathan Washington
Proceedings of the Tenth Conference on Machine Translation
Murat Jumashev | Alina Tillabaeva | Aida Kasieva | Turgunbek Omurkanov | Akylai Musaeva | Meerim Emil Kyzy | Gulaiym Chagataeva | Jonathan Washington
Proceedings of the Tenth Conference on Machine Translation
We present a Kyrgyz language seed dataset as part of our contribution to the WMT25 Open Language Data Initiative (OLDI) shared task. This paper details the process of collecting and curating English–Kyrgyz translations, highlighting the main challenges encountered in translating into a morphologically rich, low-resource language. We demonstrate the quality of the dataset through fine-tuning experiments, showing consistent improvements in machine translation performance across multiple models. Comparisons with bilingual and MNMT Kyrgyz-English baselines reveal that, for some models, our dataset enables performance surpassing pretrained baselines in both English–Kyrgyz and Kyrgyz–English translation directions. These results validate the dataset’s utility and suggest that it can serve as a valuable resource for the Kyrgyz MT community and other related low-resource languages.
2024
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
Atul Kr. Ojha | Chao-hong Liu | Ekaterina Vylomova | Flammie Pirinen | Jade Abbott | Jonathan Washington | Nathaniel Oco | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
Atul Kr. Ojha | Chao-hong Liu | Ekaterina Vylomova | Flammie Pirinen | Jade Abbott | Jonathan Washington | Nathaniel Oco | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
Strategies for the Annotation of Pronominalised Locatives in Turkic Universal Dependency Treebanks
Jonathan Washington | Çağrı Çöltekin | Furkan Akkurt | Bermet Chontaeva | Soudabeh Eslami | Gulnura Jumalieva | Aida Kasieva | Aslı Kuzgun | Büşra Marşan | Chihiro Taguchi
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Jonathan Washington | Çağrı Çöltekin | Furkan Akkurt | Bermet Chontaeva | Soudabeh Eslami | Gulnura Jumalieva | Aida Kasieva | Aslı Kuzgun | Büşra Marşan | Chihiro Taguchi
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
As part of our efforts to develop unified Universal Dependencies (UD) guidelines for Turkic languages, we evaluate multiple approaches to a difficult morphosyntactic phenomenon, pronominal locative expressions formed by a suffix -ki. These forms result in multiple syntactic words, with potentially conflicting morphological features, and participating in different dependency relations. We describe multiple approaches to the problem in current (and upcoming) Turkic UD treebanks, and show that none of them offers a solution that satisfies a number of constraints we consider (including constraints imposed by UD guidelines). This calls for a compromise with the ‘least damage’ that should be adopted by most, if not all, Turkic treebanks. Our discussion of the phenomenon and various annotation approaches may also help treebanking efforts for other languages or language families with similar constructions.
2023
Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)
Atul Kr. Ojha | Chao-hong Liu | Ekaterina Vylomova | Flammie Pirinen | Jade Abbott | Jonathan Washington | Nathaniel Oco | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)
Atul Kr. Ojha | Chao-hong Liu | Ekaterina Vylomova | Flammie Pirinen | Jade Abbott | Jonathan Washington | Nathaniel Oco | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)
2022
A Free/Open-Source Morphological Transducer for Western Armenian
Hossep Dolatian | Daniel Swanson | Jonathan Washington
Proceedings of the Workshop on Processing Language Variation: Digital Armenian (DigitAm) within the 13th Language Resources and Evaluation Conference
Hossep Dolatian | Daniel Swanson | Jonathan Washington
Proceedings of the Workshop on Processing Language Variation: Digital Armenian (DigitAm) within the 13th Language Resources and Evaluation Conference
We present a free/open-source morphological transducer for Western Armenian, an endangered and low-resource Indo-European language. The transducer has virtually complete coverage of the language’s inflectional morphology. We built the lexicon by scraping online dictionaries. As of submission, the transducer has a lexicon of 75K words. It has over 90% naive coverage on different Western Armenian corpora, and high precision.
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)
Atul Kr. Ojha | Chao-Hong Liu | Ekaterina Vylomova | Jade Abbott | Jonathan Washington | Nathaniel Oco | Tommi A Pirinen | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)
Atul Kr. Ojha | Chao-Hong Liu | Ekaterina Vylomova | Jade Abbott | Jonathan Washington | Nathaniel Oco | Tommi A Pirinen | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)
UniMorph 4.0: Universal Morphology
Khuyagbaatar Batsuren | Omer Goldman | Salam Khalifa | Nizar Habash | Witold Kieraś | Gábor Bella | Brian Leonard | Garrett Nicolai | Kyle Gorman | Yustinus Ghanggo Ate | Maria Ryskina | Sabrina Mielke | Elena Budianskaya | Charbel El-Khaissi | Tiago Pimentel | Michael Gasser | William Abbott Lane | Mohit Raj | Matt Coler | Jaime Rafael Montoya Samame | Delio Siticonatzi Camaiteri | Esaú Zumaeta Rojas | Didier López Francis | Arturo Oncevay | Juan López Bautista | Gema Celeste Silva Villegas | Lucas Torroba Hennigen | Adam Ek | David Guriel | Peter Dirix | Jean-Philippe Bernardy | Andrey Scherbakov | Aziyana Bayyr-ool | Antonios Anastasopoulos | Roberto Zariquiey | Karina Sheifer | Sofya Ganieva | Hilaria Cruz | Ritván Karahóǧa | Stella Markantonatou | George Pavlidis | Matvey Plugaryov | Elena Klyachko | Ali Salehi | Candy Angulo | Jatayu Baxi | Andrew Krizhanovsky | Natalia Krizhanovskaya | Elizabeth Salesky | Clara Vania | Sardana Ivanova | Jennifer White | Rowan Hall Maudslay | Josef Valvoda | Ran Zmigrod | Paula Czarnowska | Irene Nikkarinen | Aelita Salchak | Brijesh Bhatt | Christopher Straughn | Zoey Liu | Jonathan North Washington | Yuval Pinter | Duygu Ataman | Marcin Wolinski | Totok Suhardijanto | Anna Yablonskaya | Niklas Stoehr | Hossep Dolatian | Zahroh Nuriah | Shyam Ratan | Francis M. Tyers | Edoardo M. Ponti | Grant Aiton | Aryaman Arora | Richard J. Hatcher | Ritesh Kumar | Jeremiah Young | Daria Rodionova | Anastasia Yemelina | Taras Andrushko | Igor Marchenko | Polina Mashkovtseva | Alexandra Serova | Emily Prud’hommeaux | Maria Nepomniashchaya | Fausto Giunchiglia | Eleanor Chodroff | Mans Hulden | Miikka Silfverberg | Arya D. McCarthy | David Yarowsky | Ryan Cotterell | Reut Tsarfaty | Ekaterina Vylomova
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Khuyagbaatar Batsuren | Omer Goldman | Salam Khalifa | Nizar Habash | Witold Kieraś | Gábor Bella | Brian Leonard | Garrett Nicolai | Kyle Gorman | Yustinus Ghanggo Ate | Maria Ryskina | Sabrina Mielke | Elena Budianskaya | Charbel El-Khaissi | Tiago Pimentel | Michael Gasser | William Abbott Lane | Mohit Raj | Matt Coler | Jaime Rafael Montoya Samame | Delio Siticonatzi Camaiteri | Esaú Zumaeta Rojas | Didier López Francis | Arturo Oncevay | Juan López Bautista | Gema Celeste Silva Villegas | Lucas Torroba Hennigen | Adam Ek | David Guriel | Peter Dirix | Jean-Philippe Bernardy | Andrey Scherbakov | Aziyana Bayyr-ool | Antonios Anastasopoulos | Roberto Zariquiey | Karina Sheifer | Sofya Ganieva | Hilaria Cruz | Ritván Karahóǧa | Stella Markantonatou | George Pavlidis | Matvey Plugaryov | Elena Klyachko | Ali Salehi | Candy Angulo | Jatayu Baxi | Andrew Krizhanovsky | Natalia Krizhanovskaya | Elizabeth Salesky | Clara Vania | Sardana Ivanova | Jennifer White | Rowan Hall Maudslay | Josef Valvoda | Ran Zmigrod | Paula Czarnowska | Irene Nikkarinen | Aelita Salchak | Brijesh Bhatt | Christopher Straughn | Zoey Liu | Jonathan North Washington | Yuval Pinter | Duygu Ataman | Marcin Wolinski | Totok Suhardijanto | Anna Yablonskaya | Niklas Stoehr | Hossep Dolatian | Zahroh Nuriah | Shyam Ratan | Francis M. Tyers | Edoardo M. Ponti | Grant Aiton | Aryaman Arora | Richard J. Hatcher | Ritesh Kumar | Jeremiah Young | Daria Rodionova | Anastasia Yemelina | Taras Andrushko | Igor Marchenko | Polina Mashkovtseva | Alexandra Serova | Emily Prud’hommeaux | Maria Nepomniashchaya | Fausto Giunchiglia | Eleanor Chodroff | Mans Hulden | Miikka Silfverberg | Arya D. McCarthy | David Yarowsky | Ryan Cotterell | Reut Tsarfaty | Ekaterina Vylomova
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation, and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements on several fronts that were made in the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 66 new languages, including 24 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g., missing gender and macrons information. We have amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet.
A Free/Open-Source Morphological Analyser and Generator for Sakha
Sardana Ivanova | Jonathan Washington | Francis Tyers
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Sardana Ivanova | Jonathan Washington | Francis Tyers
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We present, to our knowledge, the first ever published morphological analyser and generator for Sakha, a marginalised language of Siberia. The transducer, developed using HFST, has coverage of solidly above 90%, and high precision. In the development of the analyser, we have expanded linguistic knowledge about Sakha, and developed strategies for complex grammatical patterns. The transducer is already being used in downstream tasks, including computer assisted language learning applications for linguistic maintenance and computational linguistic shared tasks.
2021
Towards a morphological transducer and orthography converter for Western Tlacolula Valley Zapotec
Jonathan Washington | Felipe Lopez | Brook Lillehaugen
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
Jonathan Washington | Felipe Lopez | Brook Lillehaugen
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
This paper presents work towards a morphological transducer and orthography converter for Dizhsa, or San Lucas Quiaviní Zapotec, an endangered Western Tlacolula Valley Zapotec language. The implementation of various aspects of the language’s morphology is presented, as well as the transducer’s ability to perform analysis in two orthographies and convert between them. Potential uses of the transducer for language maintenance and issues of licensing are also discussed. Evaluation of the transducer shows that it is fairly robust although incomplete, and evaluation of orthographic conversion shows that this method is strongly affected by the coverage of the transducer.
SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages
Tiago Pimentel | Maria Ryskina | Sabrina J. Mielke | Shijie Wu | Eleanor Chodroff | Brian Leonard | Garrett Nicolai | Yustinus Ghanggo Ate | Salam Khalifa | Nizar Habash | Charbel El-Khaissi | Omer Goldman | Michael Gasser | William Lane | Matt Coler | Arturo Oncevay | Jaime Rafael Montoya Samame | Gema Celeste Silva Villegas | Adam Ek | Jean-Philippe Bernardy | Andrey Shcherbakov | Aziyana Bayyr-ool | Karina Sheifer | Sofya Ganieva | Matvey Plugaryov | Elena Klyachko | Ali Salehi | Andrew Krizhanovsky | Natalia Krizhanovsky | Clara Vania | Sardana Ivanova | Aelita Salchak | Christopher Straughn | Zoey Liu | Jonathan North Washington | Duygu Ataman | Witold Kieraś | Marcin Woliński | Totok Suhardijanto | Niklas Stoehr | Zahroh Nuriah | Shyam Ratan | Francis M. Tyers | Edoardo M. Ponti | Grant Aiton | Richard J. Hatcher | Emily Prud’hommeaux | Ritesh Kumar | Mans Hulden | Botond Barta | Dorina Lakatos | Gábor Szolnok | Judit Ács | Mohit Raj | David Yarowsky | Ryan Cotterell | Ben Ambridge | Ekaterina Vylomova
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Tiago Pimentel | Maria Ryskina | Sabrina J. Mielke | Shijie Wu | Eleanor Chodroff | Brian Leonard | Garrett Nicolai | Yustinus Ghanggo Ate | Salam Khalifa | Nizar Habash | Charbel El-Khaissi | Omer Goldman | Michael Gasser | William Lane | Matt Coler | Arturo Oncevay | Jaime Rafael Montoya Samame | Gema Celeste Silva Villegas | Adam Ek | Jean-Philippe Bernardy | Andrey Shcherbakov | Aziyana Bayyr-ool | Karina Sheifer | Sofya Ganieva | Matvey Plugaryov | Elena Klyachko | Ali Salehi | Andrew Krizhanovsky | Natalia Krizhanovsky | Clara Vania | Sardana Ivanova | Aelita Salchak | Christopher Straughn | Zoey Liu | Jonathan North Washington | Duygu Ataman | Witold Kieraś | Marcin Woliński | Totok Suhardijanto | Niklas Stoehr | Zahroh Nuriah | Shyam Ratan | Francis M. Tyers | Edoardo M. Ponti | Grant Aiton | Richard J. Hatcher | Emily Prud’hommeaux | Ritesh Kumar | Mans Hulden | Botond Barta | Dorina Lakatos | Gábor Szolnok | Judit Ács | Mohit Raj | David Yarowsky | Ryan Cotterell | Ben Ambridge | Ekaterina Vylomova
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
This year’s iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems’ predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems’ performance on previously unseen lemmas.
2020
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Alina Karakanta | Atul Kr. Ojha | Chao-Hong Liu | Jade Abbott | John Ortega | Jonathan Washington | Nathaniel Oco | Surafel Melaku Lakew | Tommi A Pirinen | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Alina Karakanta | Atul Kr. Ojha | Chao-Hong Liu | Jade Abbott | John Ortega | Jonathan Washington | Nathaniel Oco | Surafel Melaku Lakew | Tommi A Pirinen | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
2019
A biscriptual morphological transducer for Crimean Tatar
Francis M. Tyers | Jonathan Washington | Darya Kavitskaya | Memduh Gökırmak | Nick Howell | Remziye Berberova
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)
Francis M. Tyers | Jonathan Washington | Darya Kavitskaya | Memduh Gökırmak | Nick Howell | Remziye Berberova
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages
Alina Karakanta | Atul Kr. Ojha | Chao-Hong Liu | Jonathan Washington | Nathaniel Oco | Surafel Melaku Lakew | Valentin Malykh | Xiaobing Zhao
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages
Alina Karakanta | Atul Kr. Ojha | Chao-Hong Liu | Jonathan Washington | Nathaniel Oco | Surafel Melaku Lakew | Valentin Malykh | Xiaobing Zhao
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages
A free/open-source rule-based machine translation system for Crimean Tatar to Turkish
Memduh Gökırmak | Francis Tyers | Jonathan Washington
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages
Memduh Gökırmak | Francis Tyers | Jonathan Washington
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages
2018
Rule-based machine translation from Kazakh to Turkish
Sevilay Bayatli | Sefer Kurnaz | Ilnar Salimzyanov | Jonathan Washington | Francis M. Tyers
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Sevilay Bayatli | Sefer Kurnaz | Ilnar Salimzyanov | Jonathan Washington | Francis M. Tyers
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
This paper presents a shallow-transfer machine translation (MT) system for translating from Kazakh to Turkish. Background on the differences between the languages is presented, followed by how the system was designed to handle some of these differences. The system is based on the Apertium free/open-source machine translation platform. The structure of the system and how it works is described, along with an evaluation against two competing systems. Linguistic components were developed, including a Kazakh-Turkish bilingual dictionary, Constraint Grammar disambiguation rules, lexical selection rules, and structural transfer rules. With many known issues yet to be addressed, our RBMT system has reached performance comparable to publicly-available corpus-based MT systems between the languages.
Apertium’s Web Toolchain for Low-Resource Language Technology
Sushain Cherivirala | Shardul Chiplunkar | Jonathan Washington | Kevin Unhammer
Proceedings of the AMTA 2018 Workshop on Technologies for MT of Low Resource Languages (LoResMT 2018)
Sushain Cherivirala | Shardul Chiplunkar | Jonathan Washington | Kevin Unhammer
Proceedings of the AMTA 2018 Workshop on Technologies for MT of Low Resource Languages (LoResMT 2018)
2017
Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones
Zhenisbek Assylbekov | Rustem Takhanov | Bagdat Myrzakhmetov | Jonathan N. Washington
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Zhenisbek Assylbekov | Rustem Takhanov | Bagdat Myrzakhmetov | Jonathan N. Washington
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Syllabification does not seem to improve word-level RNN language modeling quality when compared to character-based segmentation. However, our best syllable-aware language model, achieving performance comparable to the competitive character-aware model, has 18%-33% fewer parameters and is trained 1.2-2.2 times faster.
UD Annotatrix: An annotation tool for Universal Dependencies
Francis M. Tyers | Mariya Sheyanova | Jonathan North Washington
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories
Francis M. Tyers | Mariya Sheyanova | Jonathan North Washington
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories
2016
A Finite-state Morphological Analyser for Tuvan
Francis Tyers | Aziyana Bayyr-ool | Aelita Salchak | Jonathan Washington
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Francis Tyers | Aziyana Bayyr-ool | Aelita Salchak | Jonathan Washington
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
~This paper describes the development of free/open-source finite-state morphological transducers for Tuvan, a Turkic language spoken in and around the Tuvan Republic in Russia. The finite-state toolkit used for the work is the Helsinki Finite-State Toolkit (HFST), we use the lexc formalism for modelling the morphotactics and twol formalism for modelling morphophonological alternations. We present a novel description of the morphological combinatorics of pseudo-derivational morphemes in Tuvan. An evaluation is presented which shows that the transducer has a reasonable coverage―around 93%―on freely-available corpora of the languages, and high precision―over 99%―on a manually verified test set.
Phylogenetic simulations over constraint-based grammar formalisms
Andrew Lamont | Jonathan Washington
Proceedings of the NAACL Student Research Workshop
Andrew Lamont | Jonathan Washington
Proceedings of the NAACL Student Research Workshop
2014
Finite-state morphological transducers for three Kypchak languages
Jonathan Washington | Ilnar Salimzyanov | Francis Tyers
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Jonathan Washington | Ilnar Salimzyanov | Francis Tyers
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper describes the development of free/open-source finite-state morphological transducers for three Turkic languages―Kazakh, Tatar, and Kumyk―representing one language from each of the three sub-branches of the Kypchak branch of Turkic. The finite-state toolkit used for the work is the Helsinki Finite-State Toolkit (HFST). This paper describes how the development of a transducer for each subsequent closely-related language took less development time. An evaluation is presented which shows that the transducers all have a reasonable coverage―around 90%―on freely available corpora of the languages, and high precision over a manually verified test set.
2013
A Free/Open-source Kazakh-Tatar Machine Translation System
Ilnar Salimzyanov | Jonathan Washington | Francis Tyers
Proceedings of Machine Translation Summit XIV: Papers
Ilnar Salimzyanov | Jonathan Washington | Francis Tyers
Proceedings of Machine Translation Summit XIV: Papers
2012
A finite-state morphological transducer for Kyrgyz
Jonathan Washington | Mirlan Ipasov | Francis Tyers
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Jonathan Washington | Mirlan Ipasov | Francis Tyers
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper describes the development of a free/open-source finite-state morphological transducer for Kyrgyz. The transducer has been developed for morphological generation for use within a prototype TurkishâKyrgyz machine translation system, but has also been extensively tested for analysis. The finite-state toolkit used for the work was the Helsinki Finite-State Toolkit (HFST). The paper describes some issues in Kyrgyz morphology, the development of the tool, some linguistic issues encountered and how they were dealt with, and which issues are left to resolve. An evaluation is presented which shows that the transducer has medium-level coverage, between 82% and 87% on two freely available corpora of Kyrgyz, and high precision and recall over a manually verified test set.
Search
Fix author
Co-authors
- Francis Tyers 11
- Chao-Hong Liu 6
- Nathaniel Oco 6
- Atul Kr. Ojha 6
- Ekaterina Vylomova 6
- Xiaobing Zhao 6
- Valentin Malykh 5
- Flammie A. Pirinen 5
- Jade Abbott 4
- Varvara Logacheva 4
- Aziyana Bayyr-ool 3
- Sardana Ivanova 3
- Aelita Salchak 3
- Ilnar Salimzyanov 3
- Grant Aiton 2
- Duygu Ataman 2
- Yustinus Ghanggo Ate 2
- Jean-Philippe Bernardy 2
- Eleanor Chodroff 2
- Matt Coler 2
- Ryan Cotterell 2
- Hossep Dolatian 2
- Adam Ek 2
- Charbel El-Khaissi 2
- Sofya Ganieva 2
- Michael Gasser 2
- Omer Goldman 2
- Memduh Gökırmak 2
- Nizar Habash 2
- Richard J. Hatcher 2
- Mans Hulden 2
- Alina Karakanta 2
- Aida Kasieva 2
- Salam Khalifa 2
- Witold Kieraś 2
- Elena Klyachko 2
- Andrew Krizhanovsky 2
- Ritesh Kumar 2
- Surafel Melaku Lakew 2
- Brian Leonard 2
- Zoey Liu 2
- Sabrina J. Mielke 2
- Garrett Nicolai 2
- Zahroh Nuriah 2
- Arturo Oncevay 2
- Tiago Pimentel 2
- Matvey Plugaryov 2
- Edoardo M. Ponti 2
- Emily Prud’hommeaux 2
- Mohit Raj 2
- Shyam Ratan 2
- Maria Ryskina 2
- Ali Salehi 2
- Jaime Rafael Montoya Samame 2
- Karina Sheifer 2
- Niklas Stoehr 2
- Christopher Straughn 2
- Totok Suhardijanto 2
- Clara Vania 2
- Gema Celeste Silva Villegas 2
- Marcin Woliński 2
- David Yarowsky 2
- Furkan Akkurt 1
- Ben Ambridge 1
- Antonios Anastasopoulos 1
- Taras Andrushko 1
- Candy Angulo 1
- Aryaman Arora 1
- Zhenisbek Assylbekov 1
- Botond Barta 1
- Khuyagbaatar Batsuren 1
- Jatayu Baxi 1
- Sevilay Bayatli 1
- Gábor Bella 1
- Remziye Berberova 1
- Brijesh Bhatt 1
- Elena Budianskaya 1
- Delio Siticonatzi Camaiteri 1
- Gulaiym Chagataeva 1
- Sushain Cherivirala 1
- Shardul Chiplunkar 1
- Bermet Chontaeva 1
- Hilaria Cruz 1
- Paula Czarnowska 1
- Peter Dirix 1
- Meerim Emil Kyzy 1
- Soudabeh Eslami 1
- Fausto Giunchiglia 1
- Kyle Gorman 1
- David Guriel 1
- Nick Howell 1
- Mirlan Ipasov 1
- Gulnura Jumalieva 1
- Murat Jumashev 1
- Ritván Karahóǧa 1
- Darya Kavitskaya 1
- Natalia Krizhanovskaya 1
- Natalia Krizhanovsky 1
- Sefer Kurnaz 1
- Aslı Kuzgun 1
- Dorina Lakatos 1
- Andrew Lamont 1
- William Lane 1
- William Abbott Lane 1
- Brook Lillehaugen 1
- Felipe Lopez 1
- Joseph Lukner 1
- Juan López Bautista 1
- Didier López Francis 1
- Igor Marchenko 1
- Stella Markantonatou 1
- Büşra Marşan 1
- Polina Mashkovtseva 1
- Rowan Hall Maudslay 1
- Arya D. McCarthy 1
- Akylai Musaeva 1
- Bagdat Myrzakhmetov 1
- Maria Nepomniashchaya 1
- Irene Nikkarinen 1
- Turgunbek Omurkanov 1
- John Ortega 1
- Maura O’Leary 1
- George Pavlidis 1
- Yuval Pinter 1
- Daria Rodionova 1
- Esaú Zumaeta Rojas 1
- Elizabeth Salesky 1
- Andrey Scherbakov 1
- Alexandra Serova 1
- Andrey Shcherbakov 1
- Mariya Sheyanova 1
- Miikka Silfverberg 1
- Daniel G. Swanson 1
- Gábor Szolnok 1
- Chihiro Taguchi 1
- Rustem Takhanov 1
- Alina Tillabaeva 1
- Lucas Torroba Hennigen 1
- Reut Tsarfaty 1
- Kevin Unhammer 1
- Josef Valvoda 1
- Finn Verdonk 1
- Jennifer White 1
- Shijie Wu 1
- Anna Yablonskaya 1
- Anastasia Yemelina 1
- Jeremiah Young 1
- Roberto Zariquiey 1
- Ran Zmigrod 1
- Willem de Reuse 1
- Judit Ács 1
- Çağrı Çöltekin 1