Niklas Stoehr


2022

pdf
Extended Multilingual Protest News Detection - Shared Task 1, CASE 2021 and 2022
Ali Hürriyetoğlu | Osman Mutlu | Fırat Duruşan | Onur Uca | Alaeddin Gürel | Benjamin J. Radford | Yaoyao Dai | Hansi Hettiarachchi | Niklas Stoehr | Tadashi Nomoto | Milena Slavcheva | Francielle Vargas | Aaqib Javid | Fatih Beyhan | Erdem Yörük
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Subtask 2 in all languages. Only the following scenarios were not outperformed by new submissions on CASE 2021: Subtask 3 Portuguese {& Subtask 4 English.

pdf
UniMorph 4.0: Universal Morphology
Khuyagbaatar Batsuren | Omer Goldman | Salam Khalifa | Nizar Habash | Witold Kieraś | Gábor Bella | Brian Leonard | Garrett Nicolai | Kyle Gorman | Yustinus Ghanggo Ate | Maria Ryskina | Sabrina Mielke | Elena Budianskaya | Charbel El-Khaissi | Tiago Pimentel | Michael Gasser | William Abbott Lane | Mohit Raj | Matt Coler | Jaime Rafael Montoya Samame | Delio Siticonatzi Camaiteri | Esaú Zumaeta Rojas | Didier López Francis | Arturo Oncevay | Juan López Bautista | Gema Celeste Silva Villegas | Lucas Torroba Hennigen | Adam Ek | David Guriel | Peter Dirix | Jean-Philippe Bernardy | Andrey Scherbakov | Aziyana Bayyr-ool | Antonios Anastasopoulos | Roberto Zariquiey | Karina Sheifer | Sofya Ganieva | Hilaria Cruz | Ritván Karahóǧa | Stella Markantonatou | George Pavlidis | Matvey Plugaryov | Elena Klyachko | Ali Salehi | Candy Angulo | Jatayu Baxi | Andrew Krizhanovsky | Natalia Krizhanovskaya | Elizabeth Salesky | Clara Vania | Sardana Ivanova | Jennifer White | Rowan Hall Maudslay | Josef Valvoda | Ran Zmigrod | Paula Czarnowska | Irene Nikkarinen | Aelita Salchak | Brijesh Bhatt | Christopher Straughn | Zoey Liu | Jonathan North Washington | Yuval Pinter | Duygu Ataman | Marcin Wolinski | Totok Suhardijanto | Anna Yablonskaya | Niklas Stoehr | Hossep Dolatian | Zahroh Nuriah | Shyam Ratan | Francis M. Tyers | Edoardo M. Ponti | Grant Aiton | Aryaman Arora | Richard J. Hatcher | Ritesh Kumar | Jeremiah Young | Daria Rodionova | Anastasia Yemelina | Taras Andrushko | Igor Marchenko | Polina Mashkovtseva | Alexandra Serova | Emily Prud’hommeaux | Maria Nepomniashchaya | Fausto Giunchiglia | Eleanor Chodroff | Mans Hulden | Miikka Silfverberg | Arya D. McCarthy | David Yarowsky | Ryan Cotterell | Reut Tsarfaty | Ekaterina Vylomova
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation, and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements on several fronts that were made in the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 66 new languages, including 24 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g., missing gender and macrons information. We have amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive.In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet.

pdf
Attentional Probe: Estimating a Module’s Functional Potential
Tiago Pimentel | Josef Valvoda | Niklas Stoehr | Ryan Cotterell
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

2021

pdf
Classifying Dyads for Militarized Conflict Analysis
Niklas Stoehr | Lucas Torroba Hennigen | Samin Ahbab | Robert West | Ryan Cotterell
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Understanding the origins of militarized conflict is a complex, yet important undertaking. Existing research seeks to build this understanding by considering bi-lateral relationships between entity pairs (dyadic causes) and multi-lateral relationships among multiple entities (systemic causes). The aim of this work is to compare these two causes in terms of how they correlate with conflict between two entities. We do this by devising a set of textual and graph-based features which represent each of the causes. The features are extracted from Wikipedia and modeled as a large graph. Nodes in this graph represent entities connected by labeled edges representing ally or enemy-relationships. This allows casting the problem as an edge classification task, which we term dyad classification. We propose and evaluate classifiers to determine if a particular pair of entities are allies or enemies. Our results suggest that our systemic features might be slightly better correlates of conflict. Further, we find that Wikipedia articles of allies are semantically more similar than enemies.

pdf
SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages
Tiago Pimentel | Maria Ryskina | Sabrina J. Mielke | Shijie Wu | Eleanor Chodroff | Brian Leonard | Garrett Nicolai | Yustinus Ghanggo Ate | Salam Khalifa | Nizar Habash | Charbel El-Khaissi | Omer Goldman | Michael Gasser | William Lane | Matt Coler | Arturo Oncevay | Jaime Rafael Montoya Samame | Gema Celeste Silva Villegas | Adam Ek | Jean-Philippe Bernardy | Andrey Shcherbakov | Aziyana Bayyr-ool | Karina Sheifer | Sofya Ganieva | Matvey Plugaryov | Elena Klyachko | Ali Salehi | Andrew Krizhanovsky | Natalia Krizhanovsky | Clara Vania | Sardana Ivanova | Aelita Salchak | Christopher Straughn | Zoey Liu | Jonathan North Washington | Duygu Ataman | Witold Kieraś | Marcin Woliński | Totok Suhardijanto | Niklas Stoehr | Zahroh Nuriah | Shyam Ratan | Francis M. Tyers | Edoardo M. Ponti | Grant Aiton | Richard J. Hatcher | Emily Prud’hommeaux | Ritesh Kumar | Mans Hulden | Botond Barta | Dorina Lakatos | Gábor Szolnok | Judit Ács | Mohit Raj | David Yarowsky | Ryan Cotterell | Ben Ambridge | Ekaterina Vylomova
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems' predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems' performance on previously unseen lemmas.

pdf
Team “NoConflict” at CASE 2021 Task 1: Pretraining for Sentence-Level Protest Event Detection
Tiancheng Hu | Niklas Stoehr
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

An ever-increasing amount of text, in the form of social media posts and news articles, gives rise to new challenges and opportunities for the automatic extraction of socio-political events. In this paper, we present our submission to the Shared Tasks on Socio-Political and Crisis Events Detection, Task 1, Multilingual Protest News Detection, Subtask 2, Event Sentence Classification, of CASE @ ACL-IJCNLP 2021. In our submission, we utilize the RoBERTa model with additional pretraining, and achieve the best F1 score of 0.8532 in event sentence classification in English and the second-best F1 score of 0.8700 in Portuguese via simple translation. We analyze the failure cases of our model. We also conduct an ablation study to show the effect of choosing the right pretrained language model, adding additional training data and data augmentation.

pdf
Team “DaDeFrNi” at CASE 2021 Task 1: Document and Sentence Classification for Protest Event Detection
Francesco Re | Daniel Vegh | Dennis Atzenhofer | Niklas Stoehr
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

This paper accompanies our top-performing submission to the CASE 2021 shared task, which is hosted at the workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text. Subtasks 1 and 2 of Task 1 concern the classification of newspaper articles and sentences into “conflict” versus “not conflict”-related in four different languages. Our model performs competitively in both subtasks (up to 0.8662 macro F1), obtaining the highest score of all contributions for subtask 1 on Hindi articles (0.7877 macro F1). We describe all experiments conducted with the XLM-RoBERTa (XLM-R) model and report results obtained in each binary classification task. We propose supplementing the original training data with additional data on political conflict events. In addition, we provide an analysis of unigram probability estimates and geospatial references contained within the original training corpus.

pdf
Discovering Black Lives Matter Events in the United States: Shared Task 3, CASE 2021
Salvatore Giorgi | Vanni Zavarella | Hristo Tanev | Nicolas Stefanovitch | Sy Hwang | Hansi Hettiarachchi | Tharindu Ranasinghe | Vivek Kalyan | Paul Tan | Shaun Tan | Martin Andrews | Tiancheng Hu | Niklas Stoehr | Francesco Ignazio Re | Daniel Vegh | Dennis Atzenhofer | Brenda Curtis | Ali Hürriyetoğlu
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

Evaluating the state-of-the-art event detection systems on determining spatio-temporal distribution of the events on the ground is performed unfrequently. But, the ability to both (1) extract events “in the wild” from text and (2) properly evaluate event detection systems has potential to support a wide variety of tasks such as monitoring the activity of socio-political movements, examining media coverage and public support of these movements, and informing policy decisions. Therefore, we study performance of the best event detection systems on detecting Black Lives Matter (BLM) events from tweets and news articles. The murder of George Floyd, an unarmed Black man, at the hands of police officers received global attention throughout the second half of 2020. Protests against police violence emerged worldwide and the BLM movement, which was once mostly regulated to the United States, was now seeing activity globally. This shared task asks participants to identify BLM related events from large unstructured data sources, using systems pretrained to extract socio-political events from text. We evaluate several metrics, accessing each system’s ability to identify protest events both temporally and spatially. Results show that identifying daily protest counts is an easier task than classifying spatial and temporal protest trends simultaneously, with maximum performance of 0.745 and 0.210 (Pearson r), respectively. Additionally, all baselines and participant systems suffered from low recall, with a maximum recall of 5.08.

pdf
What About the Precedent: An Information-Theoretic Analysis of Common Law
Josef Valvoda | Tiago Pimentel | Niklas Stoehr | Ryan Cotterell | Simone Teufel
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In common law, the outcome of a new case is determined mostly by precedent cases, rather than by existing statutes. However, how exactly does the precedent influence the outcome of a new case? Answering this question is crucial for guaranteeing fair and consistent judicial decision-making. We are the first to approach this question computationally by comparing two longstanding jurisprudential views; Halsbury’s, who believes that the arguments of the precedent are the main determinant of the outcome, and Goodhart’s, who believes that what matters most is the precedent’s facts. We base our study on the corpus of legal cases from the European Court of Human Rights (ECtHR), which allows us to access not only the case itself, but also cases cited in the judges’ arguments (i.e. the precedent cases). Taking an information-theoretic view, and modelling the question as a case out-come classification task, we find that the precedent’s arguments share 0.38 nats of information with the case’s outcome, whereas precedent’s facts only share 0.18 nats of information (i.e.,58% less); suggesting Halsbury’s view may be more accurate in this specific court. We found however in a qualitative analysis that there are specific statues where Goodhart’s view dominates, and present some evidence these are the ones where the legal concept at hand is less straightforward.
Search
Co-authors