Reyyan Yeniterzi


2022

pdf
Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2022): Workshop and Shared Task Report
Ali Hürriyetoğlu | Hristo Tanev | Vanni Zavarella | Reyyan Yeniterzi | Osman Mutlu | Erdem Yörük
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

We provide a summary of the fifth edition of the CASE workshop that is held in the scope of EMNLP 2022. The workshop consists of regular papers, two keynotes, working papers of shared task participants, and task overview papers. This workshop has been bringing together all aspects of event information collection across technical and social science fields. In addition to the progress in depth, the submission and acceptance of multimodal approaches show the widening of this interdisciplinary research topic.

pdf
A Turkish Hate Speech Dataset and Detection System
Fatih Beyhan | Buse Çarık | İnanç Arın | Ayşecan Terzioğlu | Berrin Yanikoglu | Reyyan Yeniterzi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Social media posts containing hate speech are reproduced and redistributed at an accelerated pace, reaching greater audiences at a higher speed. We present a machine learning system for automatic detection of hate speech in Turkish, along with a hate speech dataset consisting of tweets collected in two separate domains. We first adopted a definition for hate speech that is in line with our goals and amenable to easy annotation; then designed the annotation schema for annotating the collected tweets. The Istanbul Convention dataset consists of tweets posted following the withdrawal of Turkey from the Istanbul Convention. The Refugees dataset was created by collecting tweets about immigrants by filtering based on commonly used keywords related to immigrants. Finally, we have developed a hate speech detection system using the transformer architecture (BERTurk), to be used as a baseline for the collected dataset. The binary classification accuracy is 77% when the system is evaluated using 5-fold cross-validation on the Istanbul Convention dataset and 71% for the Refugee dataset. We also tested a regression model with 0.66 and 0.83 RMSE on a scale of [0-4], for the Istanbul Convention and Refugees datasets.

pdf
A Twitter Corpus for Named Entity Recognition in Turkish
Buse Çarık | Reyyan Yeniterzi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper introduces a new Turkish Twitter Named Entity Recognition dataset. The dataset, which consists of 5000 tweets from a year-long period, was labeled by multiple annotators with a high agreement score. The dataset is also diverse in terms of the named entity types as it contains not only person, organization, and location but also time, money, product, and tv-show categories. Our initial experiments with pretrained language models (like BertTurk) over this dataset returned F1 scores of around 80%. We share this dataset publicly.

pdf
SU-NLP at SemEval-2022 Task 11: Complex Named Entity Recognition with Entity Linking
Buse Çarık | Fatih Beyhan | Reyyan Yeniterzi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes the system proposed by Sabancı University Natural Language Processing Group in the SemEval-2022 MultiCoNER task. We developed an unsupervised entity linking pipeline that detects potential entity mentions with the help of Wikipedia and also uses the corresponding Wikipedia context to help the classifier in finding the named entity type of that mention. The proposed pipeline significantly improved the performance, especially for complex entities in low-context settings.

pdf
WordNet and Wikipedia Connection in Turkish WordNet KeNet
Merve Doğan | Ceren Oksal | Arife Betül Yenice | Fatih Beyhan | Reyyan Yeniterzi | Olcay Taner Yıldız
Proceedings of Globalex Workshop on Linked Lexicography within the 13th Language Resources and Evaluation Conference

This paper aims to present WordNet and Wikipedia connection by linking synsets from Turkish WordNet KeNet with Wikipedia and thus, provide a better machine-readable dictionary to create an NLP model with rich data. For this purpose, manual mapping between two resources is realized and 11,478 synsets are linked to Wikipedia. In addition to this, automatic linking approaches are utilized to analyze possible connection suggestions. Baseline Approach and ElasticSearch Based Approach help identify the potential human annotation errors and analyze the effectiveness of these approaches in linking. Adopting both manual and automatic mapping provides us with an encompassing resource of WordNet and Wikipedia connections.

2021

pdf bib
Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021): Workshop and Shared Task Report
Ali Hürriyetoğlu | Hristo Tanev | Vanni Zavarella | Jakub Piskorski | Reyyan Yeniterzi | Osman Mutlu | Deniz Yuret | Aline Villavicencio
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

This workshop is the fourth issue of a series of workshops on automatic extraction of socio-political events from news, organized by the Emerging Market Welfare Project, with the support of the Joint Research Centre of the European Commission and with contributions from many other prominent scholars in this field. The purpose of this series of workshops is to foster research and development of reliable, valid, robust, and practical solutions for automatically detecting descriptions of socio-political events, such as protests, riots, wars and armed conflicts, in text streams. This year workshop contributors make use of the state-of-the-art NLP technologies, such as Deep Learning, Word Embeddings and Transformers and cover a wide range of topics from text classification to news bias detection. Around 40 teams have registered and 15 teams contributed to three tasks that are i) multilingual protest news detection detection, ii) fine-grained classification of socio-political events, and iii) discovering Black Lives Matter protest events. The workshop also highlights two keynote and four invited talks about various aspects of creating event data sets and multi- and cross-lingual machine learning in few- and zero-shot settings.

pdf
SU-NLP at CASE 2021 Task 1: Protest News Detection for English
Furkan Çelik | Tuğberk Dalkılıç | Fatih Beyhan | Reyyan Yeniterzi
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

This paper summarizes our group’s efforts in the multilingual protest news detection shared task, which is organized as a part of the Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) Workshop. We participated in all four subtasks in English. Especially in the identification of event containing sentences task, our proposed ensemble approach using RoBERTa and multichannel CNN-LexStem model yields higher performance. Similarly in the event extraction task, our transformer-LSTM-CRF architecture outperforms regular transformers significantly.

2020

pdf
Event Clustering within News Articles
Faik Kerem Örs | Süveyda Yeniterzi | Reyyan Yeniterzi
Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020

This paper summarizes our group’s efforts in the event sentence coreference identification shared task, which is organized as part of the Automated Extraction of Socio-Political Events from News (AESPEN) Workshop. Our main approach consists of three steps. We initially use a transformer based model to predict whether a pair of sentences refer to the same event or not. Later, we use these predictions as the initial scores and recalculate the pair scores by considering the relation of sentences in a pair with respect to other sentences. As the last step, final scores between these sentences are used to construct the clusters, starting with the pairs with the highest scores. Our proposed approach outperforms the baseline approach across all evaluation metrics.

pdf
SU-NLP at SemEval-2020 Task 12: Offensive Language IdentifiCation in Turkish Tweets
Anil Ozdemir | Reyyan Yeniterzi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper summarizes our group’s efforts in the offensive language identification shared task, which is organized as part of the International Workshop on Semantic Evaluation (Sem-Eval2020). Our final submission system is an ensemble of three different models, (1) CNN-LSTM, (2) BiLSTM-Attention and (3) BERT. Word embeddings, which were pre-trained on tweets, are used while training the first two models. BERTurk, which is the first BERT model for Turkish, is also explored. Our final submitted approach ranked as the second best model in the Turkish sub-task.

pdf
SU-NLP at WNUT-2020 Task 2: The Ensemble Models
Kenan Fayoumi | Reyyan Yeniterzi
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

In this paper, we address the problem of identifying informative tweets related to COVID-19 in the form of a binary classification task as part of our submission for W-NUT 2020 Task 2. Specifically, we focus on ensembling methods to boost the classification performance of classification models such as BERT and CNN. We show that ensembling can reduce the variance in performance, specifically for BERT base models.

2011

pdf
Exploiting Morphology in Turkish Named Entity Recognition System
Reyyan Yeniterzi
Proceedings of the ACL 2011 Student Session

2010

pdf
Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish
Reyyan Yeniterzi | Kemal Oflazer
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf
Transforming Controlled Natural Language Biomedical Queries into Answer Set Programs
Esra Erdem | Reyyan Yeniterzi
Proceedings of the BioNLP 2009 Workshop