Antti Kanner
2023
Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach
Maciej Janicki
|
Antti Kanner
|
Eetu Mäkelä
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
We approach the problem of recognition and attribution of quotes in Finnish news media. Solving this task would create possibilities for large-scale analysis of media wrt. the presence and styles of presentation of different voices and opinions. We describe the annotation of a corpus of media texts, numbering around 1500 articles, with quote attribution and coreference information. Further, we compare two methods for automatic quote recognition: a rule-based one operating on dependency trees and a machine learning one built on top of the BERT language model. We conclude that BERT provides more promising results even with little training data, achieving 95% F-score on direct quote recognition and 84% for indirect quotes. Finally, we discuss open problems and further associated tasks, especially the necessity of resolving speaker mentions to entity references.
Search