Varanalysis@SV-Ident 2022: Variable Detection and Disambiguation Based on Semantic Similarity

Alica Hövelmeyer, Yavuz Selim Kartal


Abstract
This paper describes an approach to the SV-Ident Shared Task which requires the detection and disambiguation of survey variables in sentences taken from social science publications. It deals with both subtasks as problems of semantic textual similarity (STS) and relies on the use of sentence transformers. Sentences and variables are examined for semantic similarity for both detecting sentences containing variables and disambiguating the respective variables. The focus is placed on analyzing the effects of including different parts of the variables and observing the differences between English and German instances. Additionally, for the variable detection task a bag of words model is used to filter out sentences which are likely to contain a variable mention as a preselection of sentences to perform the semantic similarity comparison on.
Anthology ID:
2022.sdp-1.30
Volume:
Proceedings of the Third Workshop on Scholarly Document Processing
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
247–252
Language:
URL:
https://aclanthology.org/2022.sdp-1.30
DOI:
Bibkey:
Cite (ACL):
Alica Hövelmeyer and Yavuz Selim Kartal. 2022. Varanalysis@SV-Ident 2022: Variable Detection and Disambiguation Based on Semantic Similarity. In Proceedings of the Third Workshop on Scholarly Document Processing, pages 247–252, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Varanalysis@SV-Ident 2022: Variable Detection and Disambiguation Based on Semantic Similarity (Hövelmeyer & Kartal, sdp 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.sdp-1.30.pdf
Data
SV-Ident