Abstract
We approach the problem of recognition and attribution of quotes in Finnish news media. Solving this task would create possibilities for large-scale analysis of media wrt. the presence and styles of presentation of different voices and opinions. We describe the annotation of a corpus of media texts, numbering around 1500 articles, with quote attribution and coreference information. Further, we compare two methods for automatic quote recognition: a rule-based one operating on dependency trees and a machine learning one built on top of the BERT language model. We conclude that BERT provides more promising results even with little training data, achieving 95% F-score on direct quote recognition and 84% for indirect quotes. Finally, we discuss open problems and further associated tasks, especially the necessity of resolving speaker mentions to entity references.- Anthology ID:
- 2023.nodalida-1.6
- Volume:
- Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
- Month:
- May
- Year:
- 2023
- Address:
- Tórshavn, Faroe Islands
- Editors:
- Tanel Alumäe, Mark Fishel
- Venue:
- NoDaLiDa
- SIG:
- Publisher:
- University of Tartu Library
- Note:
- Pages:
- 52–59
- Language:
- URL:
- https://aclanthology.org/2023.nodalida-1.6
- DOI:
- Cite (ACL):
- Maciej Janicki, Antti Kanner, and Eetu Mäkelä. 2023. Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 52–59, Tórshavn, Faroe Islands. University of Tartu Library.
- Cite (Informal):
- Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach (Janicki et al., NoDaLiDa 2023)
- PDF:
- https://preview.aclanthology.org/revert-3132-ingestion-checklist/2023.nodalida-1.6.pdf