Renārs Liepins

Also published as: Renars Liepins

2018

pdf abs
Integrating Multiple NLP Technologies into an Open-source Platform for Multilingual Media Monitoring
Ulrich Germann | Renārs Liepins | Didzis Gosko | Guntis Barzdins
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)

The open-source SUMMA Platform is a highly scalable distributed architecture for monitoring a large number of media broadcasts in parallel, with a lag behind actual broadcast time of at most a few minutes. It assembles numerous state-of-the-art NLP technologies into a fully automated media ingestion pipeline that can record live broadcasts, detect and transcribe spoken content, translate from several languages (original text or transcribed speech) into English, recognize Named Entities, detect topics, cluster and summarize documents across language barriers, and extract and store factual claims in these news items. This paper describes the intended use cases and discusses the system design decisions that allowed us to integrate state-of-the-art NLP modules into an effective workflow with comparatively little effort.

pdf abs
news.bridge - Automated Transcription and Translation for News
Peggy van der Kreeft | Renars Liepins
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

news.bridge provides a platform for multilingual video processing, including automated transcription and translation, subtitling, voice-over, and summarization, with post-editing facility of videos in a broad range of languages. The platform is currently in beta testing at Deutsche Welle for republishing of videos in other languages.

The open-source SUMMA Platform is a highly scalable distributed architecture for monitoring a large number of media broadcasts in parallel, with a lag behind actual broadcast time of at most a few minutes. The Platform offers a fully automated media ingestion pipeline capable of recording live broadcasts, detection and transcription of spoken content, translation of all text (original or transcribed) into English, recognition and linking of Named Entities, topic detection, clustering and cross-lingual multi-document summarization of related media items, and last but not least, extraction and storage of factual claims in these news items. Browser-based graphical user interfaces provide humans with aggregated information as well as structured access to individual news items stored in the Platform’s database. This paper describes the intended use cases and provides an overview over the system’s implementation.

2017

We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.

Renārs Liepins

2018

2017

Co-authors

Venues