Yavuz Selim Kartal


2022

pdf
Overview of the SV-Ident 2022 Shared Task on Survey Variable Identification in Social Science Publications
Tornike Tsereteli | Yavuz Selim Kartal | Simone Paolo Ponzetto | Andrea Zielinski | Kai Eckert | Philipp Mayr
Proceedings of the Third Workshop on Scholarly Document Processing

In this paper, we provide an overview of the SV-Ident shared task as part of the 3rd Workshop on Scholarly Document Processing (SDP) at COLING 2022. In the shared task, participants were provided with a sentence and a vocabulary of variables, and asked to identify which variables, if any, are mentioned in individual sentences from scholarly documents in full text. Two teams made a total of 9 submissions to the shared task leaderboard. While none of the teams improve on the baseline systems, we still draw insights from their submissions. Furthermore, we provide a detailed evaluation. Data and baselines for our shared task are freely available at https://github.com/vadis-project/sv-ident.

pdf
Varanalysis@SV-Ident 2022: Variable Detection and Disambiguation Based on Semantic Similarity
Alica Hövelmeyer | Yavuz Selim Kartal
Proceedings of the Third Workshop on Scholarly Document Processing

This paper describes an approach to the SV-Ident Shared Task which requires the detection and disambiguation of survey variables in sentences taken from social science publications. It deals with both subtasks as problems of semantic textual similarity (STS) and relies on the use of sentence transformers. Sentences and variables are examined for semantic similarity for both detecting sentences containing variables and disambiguating the respective variables. The focus is placed on analyzing the effects of including different parts of the variables and observing the differences between English and German instances. Additionally, for the variable detection task a bag of words model is used to filter out sentences which are likely to contain a variable mention as a preselection of sentences to perform the semantic similarity comparison on.

2020

pdf
TrClaim-19: The First Collection for Turkish Check-Worthy Claim Detection with Annotator Rationales
Yavuz Selim Kartal | Mucahid Kutlu
Proceedings of the 24th Conference on Computational Natural Language Learning

Massive misinformation spread over Internet has many negative impacts on our lives. While spreading a claim is easy, investigating its veracity is hard and time consuming, Therefore, we urgently need systems to help human fact-checkers. However, available data resources to develop effective systems are limited and the vast majority of them is for English. In this work, we introduce TrClaim-19, which is the very first labeled dataset for Turkish check-worthy claims. TrClaim-19 consists of labeled 2287 Turkish tweets with annotator rationales, enabling us to better understand the characteristics of check-worthy claims. The rationales we collected suggest that claims’ topics and their possible negative impacts are the main factors affecting their check-worthiness.