Integrating Multiple NLP Technologies into an Open-source Platform for Multilingual Media Monitoring

Ulrich Germann, Renārs Liepins, Didzis Gosko, Guntis Barzdins


Abstract
The open-source SUMMA Platform is a highly scalable distributed architecture for monitoring a large number of media broadcasts in parallel, with a lag behind actual broadcast time of at most a few minutes. It assembles numerous state-of-the-art NLP technologies into a fully automated media ingestion pipeline that can record live broadcasts, detect and transcribe spoken content, translate from several languages (original text or transcribed speech) into English, recognize Named Entities, detect topics, cluster and summarize documents across language barriers, and extract and store factual claims in these news items. This paper describes the intended use cases and discusses the system design decisions that allowed us to integrate state-of-the-art NLP modules into an effective workflow with comparatively little effort.
Anthology ID:
W18-2508
Volume:
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Venue:
NLPOSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
47–51
Language:
URL:
https://aclanthology.org/W18-2508
DOI:
10.18653/v1/W18-2508
Bibkey:
Cite (ACL):
Ulrich Germann, Renārs Liepins, Didzis Gosko, and Guntis Barzdins. 2018. Integrating Multiple NLP Technologies into an Open-source Platform for Multilingual Media Monitoring. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS), pages 47–51, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Integrating Multiple NLP Technologies into an Open-source Platform for Multilingual Media Monitoring (Germann et al., NLPOSS 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W18-2508.pdf