Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach

Maciej Janicki, Antti Kanner, Eetu Mäkelä

[How to correct problems with metadata yourself]


Abstract
We approach the problem of recognition and attribution of quotes in Finnish news media. Solving this task would create possibilities for large-scale analysis of media wrt. the presence and styles of presentation of different voices and opinions. We describe the annotation of a corpus of media texts, numbering around 1500 articles, with quote attribution and coreference information. Further, we compare two methods for automatic quote recognition: a rule-based one operating on dependency trees and a machine learning one built on top of the BERT language model. We conclude that BERT provides more promising results even with little training data, achieving 95% F-score on direct quote recognition and 84% for indirect quotes. Finally, we discuss open problems and further associated tasks, especially the necessity of resolving speaker mentions to entity references.
Anthology ID:
2023.nodalida-1.6
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
52–59
Language:
URL:
https://aclanthology.org/2023.nodalida-1.6
DOI:
Bibkey:
Cite (ACL):
Maciej Janicki, Antti Kanner, and Eetu Mäkelä. 2023. Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 52–59, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach (Janicki et al., NoDaLiDa 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/2023.nodalida-1.6.pdf