Overview of the SV-Ident 2022 Shared Task on Survey Variable Identification in Social Science Publications
Tornike Tsereteli | Yavuz Selim Kartal | Simone Paolo Ponzetto | Andrea Zielinski | Kai Eckert | Philipp Mayr
Proceedings of the Third Workshop on Scholarly Document Processing

In this paper, we provide an overview of the SV-Ident shared task as part of the 3rd Workshop on Scholarly Document Processing (SDP) at COLING 2022. In the shared task, participants were provided with a sentence and a vocabulary of variables, and asked to identify which variables, if any, are mentioned in individual sentences from scholarly documents in full text. Two teams made a total of 9 submissions to the shared task leaderboard. While none of the teams improve on the baseline systems, we still draw insights from their submissions. Furthermore, we provide a detailed evaluation. Data and baselines for our shared task are freely available at


Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications
Andrea Zielinski | Peter Mutschke
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


Mining Social Science Publications for Survey Variables
Andrea Zielinski | Peter Mutschke
Proceedings of the Second Workshop on NLP and Computational Social Science

Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often difficult. In this paper, we present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms, using a linguistic analysis pipeline to extract a rich feature set, including terminological concepts and similarity metric scores. Further, we present preliminary results on a small dataset that has been specifically designed for this task, yielding a significant increase in performance over the random baseline.


Using Text Segmentation Algorithms for the Automatic Generation of E-Learning Courses
Can Özmen | Alexander Streicher | Andrea Zielinski
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)


Exploiting Social Media for Natural Language Processing: Bridging the Gap between Language-centric and Real-world Applications
Simone Paolo Ponzetto | Andrea Zielinski
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Tutorials)