Witold Kieraś
2022
UniMorph 4.0: Universal Morphology
Khuyagbaatar Batsuren | Omer Goldman | Salam Khalifa | Nizar Habash | Witold Kieraś | Gábor Bella | Brian Leonard | Garrett Nicolai | Kyle Gorman | Yustinus Ghanggo Ate | Maria Ryskina | Sabrina Mielke | Elena Budianskaya | Charbel El-Khaissi | Tiago Pimentel | Michael Gasser | William Abbott Lane | Mohit Raj | Matt Coler | Jaime Rafael Montoya Samame | Delio Siticonatzi Camaiteri | Esaú Zumaeta Rojas | Didier López Francis | Arturo Oncevay | Juan López Bautista | Gema Celeste Silva Villegas | Lucas Torroba Hennigen | Adam Ek | David Guriel | Peter Dirix | Jean-Philippe Bernardy | Andrey Scherbakov | Aziyana Bayyr-ool | Antonios Anastasopoulos | Roberto Zariquiey | Karina Sheifer | Sofya Ganieva | Hilaria Cruz | Ritván Karahóǧa | Stella Markantonatou | George Pavlidis | Matvey Plugaryov | Elena Klyachko | Ali Salehi | Candy Angulo | Jatayu Baxi | Andrew Krizhanovsky | Natalia Krizhanovskaya | Elizabeth Salesky | Clara Vania | Sardana Ivanova | Jennifer White | Rowan Hall Maudslay | Josef Valvoda | Ran Zmigrod | Paula Czarnowska | Irene Nikkarinen | Aelita Salchak | Brijesh Bhatt | Christopher Straughn | Zoey Liu | Jonathan North Washington | Yuval Pinter | Duygu Ataman | Marcin Wolinski | Totok Suhardijanto | Anna Yablonskaya | Niklas Stoehr | Hossep Dolatian | Zahroh Nuriah | Shyam Ratan | Francis M. Tyers | Edoardo M. Ponti | Grant Aiton | Aryaman Arora | Richard J. Hatcher | Ritesh Kumar | Jeremiah Young | Daria Rodionova | Anastasia Yemelina | Taras Andrushko | Igor Marchenko | Polina Mashkovtseva | Alexandra Serova | Emily Prud’hommeaux | Maria Nepomniashchaya | Fausto Giunchiglia | Eleanor Chodroff | Mans Hulden | Miikka Silfverberg | Arya D. McCarthy | David Yarowsky | Ryan Cotterell | Reut Tsarfaty | Ekaterina Vylomova
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Khuyagbaatar Batsuren | Omer Goldman | Salam Khalifa | Nizar Habash | Witold Kieraś | Gábor Bella | Brian Leonard | Garrett Nicolai | Kyle Gorman | Yustinus Ghanggo Ate | Maria Ryskina | Sabrina Mielke | Elena Budianskaya | Charbel El-Khaissi | Tiago Pimentel | Michael Gasser | William Abbott Lane | Mohit Raj | Matt Coler | Jaime Rafael Montoya Samame | Delio Siticonatzi Camaiteri | Esaú Zumaeta Rojas | Didier López Francis | Arturo Oncevay | Juan López Bautista | Gema Celeste Silva Villegas | Lucas Torroba Hennigen | Adam Ek | David Guriel | Peter Dirix | Jean-Philippe Bernardy | Andrey Scherbakov | Aziyana Bayyr-ool | Antonios Anastasopoulos | Roberto Zariquiey | Karina Sheifer | Sofya Ganieva | Hilaria Cruz | Ritván Karahóǧa | Stella Markantonatou | George Pavlidis | Matvey Plugaryov | Elena Klyachko | Ali Salehi | Candy Angulo | Jatayu Baxi | Andrew Krizhanovsky | Natalia Krizhanovskaya | Elizabeth Salesky | Clara Vania | Sardana Ivanova | Jennifer White | Rowan Hall Maudslay | Josef Valvoda | Ran Zmigrod | Paula Czarnowska | Irene Nikkarinen | Aelita Salchak | Brijesh Bhatt | Christopher Straughn | Zoey Liu | Jonathan North Washington | Yuval Pinter | Duygu Ataman | Marcin Wolinski | Totok Suhardijanto | Anna Yablonskaya | Niklas Stoehr | Hossep Dolatian | Zahroh Nuriah | Shyam Ratan | Francis M. Tyers | Edoardo M. Ponti | Grant Aiton | Aryaman Arora | Richard J. Hatcher | Ritesh Kumar | Jeremiah Young | Daria Rodionova | Anastasia Yemelina | Taras Andrushko | Igor Marchenko | Polina Mashkovtseva | Alexandra Serova | Emily Prud’hommeaux | Maria Nepomniashchaya | Fausto Giunchiglia | Eleanor Chodroff | Mans Hulden | Miikka Silfverberg | Arya D. McCarthy | David Yarowsky | Ryan Cotterell | Reut Tsarfaty | Ekaterina Vylomova
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation, and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements on several fronts that were made in the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 66 new languages, including 24 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g., missing gender and macrons information. We have amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet.
HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish
Marcin Woliński | Bartłomiej Nitoń | Witold Kieraś | Jakub Szymanik
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Marcin Woliński | Bartłomiej Nitoń | Witold Kieraś | Jakub Szymanik
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The paper presents a tool for automatic marking up of quantifying expressions, their semantic features, and scopes. We explore the idea of using a BERT based neural model for the task (in this case HerBERT, a model trained specifically for Polish, is used). The tool is trained on a recent manually annotated Corpus of Polish Quantificational Expressions (Szymanik and Kieraś, 2022). We discuss how it performs against human annotation and present results of automatic annotation of 300 million sub-corpus of National Corpus of Polish. Our results show that language models can effectively recognise semantic category of quantification as well as identify key semantic properties of quantifiers, like monotonicity. Furthermore, the algorithm we have developed can be used for building semantically annotated quantifier corpora for other languages.
SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection
Jordan Kodner | Salam Khalifa | Khuyagbaatar Batsuren | Hossep Dolatian | Ryan Cotterell | Faruk Akkus | Antonios Anastasopoulos | Taras Andrushko | Aryaman Arora | Nona Atanalov | Gábor Bella | Elena Budianskaya | Yustinus Ghanggo Ate | Omer Goldman | David Guriel | Simon Guriel | Silvia Guriel-Agiashvili | Witold Kieraś | Andrew Krizhanovsky | Natalia Krizhanovsky | Igor Marchenko | Magdalena Markowska | Polina Mashkovtseva | Maria Nepomniashchaya | Daria Rodionova | Karina Scheifer | Alexandra Sorova | Anastasia Yemelina | Jeremiah Young | Ekaterina Vylomova
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Jordan Kodner | Salam Khalifa | Khuyagbaatar Batsuren | Hossep Dolatian | Ryan Cotterell | Faruk Akkus | Antonios Anastasopoulos | Taras Andrushko | Aryaman Arora | Nona Atanalov | Gábor Bella | Elena Budianskaya | Yustinus Ghanggo Ate | Omer Goldman | David Guriel | Simon Guriel | Silvia Guriel-Agiashvili | Witold Kieraś | Andrew Krizhanovsky | Natalia Krizhanovsky | Igor Marchenko | Magdalena Markowska | Polina Mashkovtseva | Maria Nepomniashchaya | Daria Rodionova | Karina Scheifer | Alexandra Sorova | Anastasia Yemelina | Jeremiah Young | Ekaterina Vylomova
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
The 2022 SIGMORPHON–UniMorph shared task on large scale morphological inflection generation included a wide range of typologically diverse languages: 33 languages from 11 top-level language families: Arabic (Modern Standard), Assamese, Braj, Chukchi, Eastern Armenian, Evenki, Georgian, Gothic, Gujarati, Hebrew, Hungarian, Itelmen, Karelian, Kazakh, Ket, Khalkha Mongolian, Kholosi, Korean, Lamahalot, Low German, Ludic, Magahi, Middle Low German, Old English, Old High German, Old Norse, Polish, Pomak, Slovak, Turkish, Upper Sorbian, Veps, and Xibe. We emphasize generalization along different dimensions this year by evaluating test items with unseen lemmas and unseen features separately under small and large training conditions. Across the five submitted systems and two baselines, the prediction of inflections with unseen features proved challenging, with average performance decreased substantially from last year. This was true even for languages for which the forms were in principle predictable, which suggests that further work is needed in designing systems that capture the various types of generalization required for the world’s languages.
2021
SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages
Tiago Pimentel | Maria Ryskina | Sabrina J. Mielke | Shijie Wu | Eleanor Chodroff | Brian Leonard | Garrett Nicolai | Yustinus Ghanggo Ate | Salam Khalifa | Nizar Habash | Charbel El-Khaissi | Omer Goldman | Michael Gasser | William Lane | Matt Coler | Arturo Oncevay | Jaime Rafael Montoya Samame | Gema Celeste Silva Villegas | Adam Ek | Jean-Philippe Bernardy | Andrey Shcherbakov | Aziyana Bayyr-ool | Karina Sheifer | Sofya Ganieva | Matvey Plugaryov | Elena Klyachko | Ali Salehi | Andrew Krizhanovsky | Natalia Krizhanovsky | Clara Vania | Sardana Ivanova | Aelita Salchak | Christopher Straughn | Zoey Liu | Jonathan North Washington | Duygu Ataman | Witold Kieraś | Marcin Woliński | Totok Suhardijanto | Niklas Stoehr | Zahroh Nuriah | Shyam Ratan | Francis M. Tyers | Edoardo M. Ponti | Grant Aiton | Richard J. Hatcher | Emily Prud’hommeaux | Ritesh Kumar | Mans Hulden | Botond Barta | Dorina Lakatos | Gábor Szolnok | Judit Ács | Mohit Raj | David Yarowsky | Ryan Cotterell | Ben Ambridge | Ekaterina Vylomova
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Tiago Pimentel | Maria Ryskina | Sabrina J. Mielke | Shijie Wu | Eleanor Chodroff | Brian Leonard | Garrett Nicolai | Yustinus Ghanggo Ate | Salam Khalifa | Nizar Habash | Charbel El-Khaissi | Omer Goldman | Michael Gasser | William Lane | Matt Coler | Arturo Oncevay | Jaime Rafael Montoya Samame | Gema Celeste Silva Villegas | Adam Ek | Jean-Philippe Bernardy | Andrey Shcherbakov | Aziyana Bayyr-ool | Karina Sheifer | Sofya Ganieva | Matvey Plugaryov | Elena Klyachko | Ali Salehi | Andrew Krizhanovsky | Natalia Krizhanovsky | Clara Vania | Sardana Ivanova | Aelita Salchak | Christopher Straughn | Zoey Liu | Jonathan North Washington | Duygu Ataman | Witold Kieraś | Marcin Woliński | Totok Suhardijanto | Niklas Stoehr | Zahroh Nuriah | Shyam Ratan | Francis M. Tyers | Edoardo M. Ponti | Grant Aiton | Richard J. Hatcher | Emily Prud’hommeaux | Ritesh Kumar | Mans Hulden | Botond Barta | Dorina Lakatos | Gábor Szolnok | Judit Ács | Mohit Raj | David Yarowsky | Ryan Cotterell | Ben Ambridge | Ekaterina Vylomova
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
This year’s iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems’ predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems’ performance on previously unseen lemmas.
2018
Multisłownik: Linking plWordNet-based Lexical Data for Lexicography and Educational Purposes
Maciej Ogrodniczuk | Joanna Bilińska | Zbigniew Bronk | Witold Kieraś
Proceedings of the 9th Global Wordnet Conference
Maciej Ogrodniczuk | Joanna Bilińska | Zbigniew Bronk | Witold Kieraś
Proceedings of the 9th Global Wordnet Conference
Multisłownik is an automated integrator of Polish lexical data retrieved from multiple available online sources intended to be used in various scenarios requiring access to such data, most prominently dictionary creation, linguistic studies and education. In contrast to many available internet dictionaries Multisłownik is WordNet-centric, capturing the core definitions from Słowosiec ́, the Polish WordNet, and linking external resources to particular synsets. The paper provides details of construction of the resource, discussed the difficulties related to linking different logical structures of underlying data and investigates two sample scenarios for using the resulting platform.
Manually Annotated Corpus of Polish Texts Published between 1830 and 1918
Witold Kieraś | Marcin Woliński
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Witold Kieraś | Marcin Woliński
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
The on-line version of Grammatical Dictionary of Polish
Marcin Woliński | Witold Kieraś
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Marcin Woliński | Witold Kieraś
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We present the new online edition of a dictionary of Polish inflection ― the Grammatical Dictionary of Polish (http://sgjp.pl). The dictionary is interesting for several reasons: it is comprehensive (over 330,000 lexemes corresponding to almost 4,300,000 different textual words; 1116 handcrafted inflectional patterns), the inflection is presented in an explicit manner in the form of carefully designed tables, the user interface facilitates advanced queries by several features (lemmas, forms, applicable grammatical categories, types of inflection). Moreover, the data of the dictionary is used in morphological analysers, including our product Morfeusz (http://sgjp.pl/morfeusz). From the start, the dictionary was meant to be comfortable for the human reader as well as to be ready for use in NLP applications. In the paper we briefly discuss both aspects of the resource.
Search
Fix author
Co-authors
- Marcin Woliński 5
- Yustinus Ghanggo Ate 3
- Ryan Cotterell 3
- Omer Goldman 3
- Salam Khalifa 3
- Andrew Krizhanovsky 3
- Ekaterina Vylomova 3
- Grant Aiton 2
- Antonios Anastasopoulos 2
- Taras Andrushko 2
- Aryaman Arora 2
- Duygu Ataman 2
- Khuyagbaatar Batsuren 2
- Aziyana Bayyr-ool 2
- Gábor Bella 2
- Jean-Philippe Bernardy 2
- Elena Budianskaya 2
- Eleanor Chodroff 2
- Matt Coler 2
- Hossep Dolatian 2
- Adam Ek 2
- Charbel El-Khaissi 2
- Sofya Ganieva 2
- Michael Gasser 2
- David Guriel 2
- Nizar Habash 2
- Richard J. Hatcher 2
- Mans Hulden 2
- Sardana Ivanova 2
- Elena Klyachko 2
- Natalia Krizhanovsky 2
- Ritesh Kumar 2
- Brian Leonard 2
- Zoey Liu 2
- Igor Marchenko 2
- Polina Mashkovtseva 2
- Sabrina J. Mielke 2
- Maria Nepomniashchaya 2
- Garrett Nicolai 2
- Zahroh Nuriah 2
- Arturo Oncevay 2
- Tiago Pimentel 2
- Matvey Plugaryov 2
- Edoardo M. Ponti 2
- Emily Prud’hommeaux 2
- Mohit Raj 2
- Shyam Ratan 2
- Daria Rodionova 2
- Maria Ryskina 2
- Aelita Salchak 2
- Ali Salehi 2
- Jaime Rafael Montoya Samame 2
- Karina Sheifer 2
- Niklas Stoehr 2
- Christopher Straughn 2
- Totok Suhardijanto 2
- Francis Tyers 2
- Clara Vania 2
- Gema Celeste Silva Villegas 2
- Jonathan Washington 2
- David Yarowsky 2
- Anastasia Yemelina 2
- Jeremiah Young 2
- Faruk Akkus 1
- Ben Ambridge 1
- Candy Angulo 1
- Nona Atanalov 1
- Botond Barta 1
- Jatayu Baxi 1
- Brijesh Bhatt 1
- Joanna Bilińska 1
- Zbigniew Bronk 1
- Delio Siticonatzi Camaiteri 1
- Hilaria Cruz 1
- Paula Czarnowska 1
- Peter Dirix 1
- Fausto Giunchiglia 1
- Kyle Gorman 1
- Simon Guriel 1
- Silvia Guriel-Agiashvili 1
- Ritván Karahóǧa 1
- Jordan Kodner 1
- Natalia Krizhanovskaya 1
- Dorina Lakatos 1
- William Lane 1
- William Abbott Lane 1
- Juan López Bautista 1
- Didier López Francis 1
- Stella Markantonatou 1
- Magdalena Markowska 1
- Rowan Hall Maudslay 1
- Arya D. McCarthy 1
- Irene Nikkarinen 1
- Bartłomiej Nitoń 1
- Maciej Ogrodniczuk 1
- George Pavlidis 1
- Yuval Pinter 1
- Esaú Zumaeta Rojas 1
- Elizabeth Salesky 1
- Karina Scheifer 1
- Andrey Scherbakov 1
- Alexandra Serova 1
- Andrey Shcherbakov 1
- Miikka Silfverberg 1
- Alexandra Sorova 1
- Gábor Szolnok 1
- Jakub Szymanik 1
- Lucas Torroba Hennigen 1
- Reut Tsarfaty 1
- Josef Valvoda 1
- Jennifer White 1
- Shijie Wu 1
- Anna Yablonskaya 1
- Roberto Zariquiey 1
- Ran Zmigrod 1
- Judit Ács 1