Abstract
Given the importance of identifying and monitoring news stories within the continuous flow of news articles, this paper presents PromptStream, a novel method for unsupervised news story discovery. In order to identify coherent and comprehensive stories across the stream, it is crucial to create article representations that incorporate as much topic-related information from the articles as possible. PromptStream constructs these article embeddings using cloze-style prompting. These representations continually adjust to the evolving context of the news stream through self-supervised learning, employing a contrastive loss and a memory of the most confident article-story assignments from the most recent days. Extensive experiments with real news datasets highlight the notable performance of our model, establishing a new state of the art. Additionally, we delve into selected news stories to reveal how the model’s structuring of the article stream aligns with story progression.- Anthology ID:
- 2024.lrec-main.1157
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 13222–13232
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1157
- DOI:
- Cite (ACL):
- Arezoo Hatefi, Anton Eklund, and Mona Forsman. 2024. PromptStream: Self-Supervised News Story Discovery Using Topic-Aware Article Representations. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13222–13232, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- PromptStream: Self-Supervised News Story Discovery Using Topic-Aware Article Representations (Hatefi et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2024.lrec-main.1157.pdf