Abstract
Even though fine-tuned neural language models have been pivotal in enabling “deep” automatic text analysis, optimizing text representations for specific applications remains a crucial bottleneck. In this study, we look at this problem in the context of a task from computational social science, namely modeling pairwise similarities between political parties. Our research question is what level of structural information is necessary to create robust text representation, contrasting a strongly informed approach (which uses both claim span and claim category annotations) with approaches that forgo one or both types of annotation with document structure-based heuristics. Evaluating our models on the manifestos of German parties for the 2021 federal election. We find that heuristics that maximize within-party over between-party similarity along with a normalization step lead to reliable party similarity prediction, without the need for manual annotation.- Anthology ID:
- 2022.conll-1.22
- Volume:
- Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Editors:
- Antske Fokkens, Vivek Srikumar
- Venue:
- CoNLL
- SIG:
- SIGNLL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 325–338
- Language:
- URL:
- https://aclanthology.org/2022.conll-1.22
- DOI:
- 10.18653/v1/2022.conll-1.22
- Cite (ACL):
- Tanise Ceron, Nico Blokker, and Sebastian Padó. 2022. Optimizing text representations to capture (dis)similarity between political parties. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), pages 325–338, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- Optimizing text representations to capture (dis)similarity between political parties (Ceron et al., CoNLL 2022)
- PDF:
- https://preview.aclanthology.org/corrections-2024-04/2022.conll-1.22.pdf