Interesting cross-border news discovery using cross-lingual article linking and document similarity
Boshko Koloski, Elaine Zosa, Timen Stepišnik-Perdih, Blaž Škrlj, Tarmo Paju, Senja Pollak
Abstract
Team Name: team-8 Embeddia Tool: Cross-Lingual Document Retrieval Zosa et al. Dataset: Estonian and Latvian news datasets abstract: Contemporary news media face increasing amounts of available data that can be of use when prioritizing, selecting and discovering new news. In this work we propose a methodology for retrieving interesting articles in a cross-border news discovery setting. More specifically, we explore how a set of seed documents in Estonian can be projected in Latvian document space and serve as a basis for discovery of novel interesting pieces of Latvian news that would interest Estonian readers. The proposed methodology was evaluated by Estonian journalist who confirmed that in the best setting, from top 10 retrieved Latvian documents, half of them represent news that are potentially interesting to be taken by the Estonian media house and presented to Estonian readers.- Anthology ID:
- 2021.hackashop-1.16
- Volume:
- Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation
- Month:
- April
- Year:
- 2021
- Address:
- Online
- Venue:
- Hackashop
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 116–120
- Language:
- URL:
- https://aclanthology.org/2021.hackashop-1.16
- DOI:
- Cite (ACL):
- Boshko Koloski, Elaine Zosa, Timen Stepišnik-Perdih, Blaž Škrlj, Tarmo Paju, and Senja Pollak. 2021. Interesting cross-border news discovery using cross-lingual article linking and document similarity. In Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation, pages 116–120, Online. Association for Computational Linguistics.
- Cite (Informal):
- Interesting cross-border news discovery using cross-lingual article linking and document similarity (Koloski et al., Hackashop 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.hackashop-1.16.pdf