Benjamin Milde

2022

pdf
Improved Open Source Automatic Subtitling for Lecture Videos
Robert Geislinger | Benjamin Milde | Chris Biemann
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)

2021

With the increasing number of user comments in diverse domains, including comments on online journalism and e-commerce websites, the manual content analysis of these comments becomes time-consuming and challenging. However, research showed that user comments contain useful information for different domain experts, which is thus worth finding and utilizing. This paper introduces Forum 4.0, an open-source framework to semi-automatically analyze, aggregate, and visualize user comments based on labels defined by domain experts. We demonstrate the applicability of Forum 4.0 with comments analytics scenarios within the domains of online journalism and app stores. We outline the underlying container architecture, including the web-based user interface, the machine learning component, and the task manager for time-consuming tasks. We finally conduct machine learning experiments with simulated annotations and different sampling strategies on existing datasets from both domains to evaluate Forum 4.0’s performance. Forum 4.0 achieves promising classification results (ROC-AUC ≥ 0.9 with 100 annotated samples), utilizing transformer-based embeddings with a lightweight logistic regression model. We explain how Forum 4.0’s architecture is applicable for millions of user comments in real-time, yet at feasible training and classification costs.

pdf
forumBERT: Topic Adaptation and Classification of Contextualized Forum Comments in German
Ayush Yadav | Benjamin Milde
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

2016

pdf abs
Ambient Search: A Document Retrieval System for Speech Streams
Benjamin Milde | Jonas Wacker | Stefan Radomski | Max Mühlhäuser | Chris Biemann
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present Ambient Search, an open source system for displaying and retrieving relevant documents in real time for speech input. The system works ambiently, that is, it unobstructively listens to speech streams in the background, identifies keywords and keyphrases for query construction and continuously serves relevant documents from its index. Query terms are ranked with Word2Vec and TF-IDF and are continuously updated to allow for ongoing querying of a document collection. The retrieved documents, in our case Wikipedia articles, are visualized in real time in a browser interface. Our evaluation shows that Ambient Search compares favorably to another implicit information retrieval system on speech streams. Furthermore, we extrinsically evaluate multiword keyphrase generation, showing positive impact for manual transcriptions.

pdf abs
Demonstrating Ambient Search: Implicit Document Retrieval for Speech Streams
Benjamin Milde | Jonas Wacker | Stefan Radomski | Max Mühlhäuser | Chris Biemann
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

In this demonstration paper we describe Ambient Search, a system that displays and retrieves documents in real time based on speech input. The system operates continuously in ambient mode, i.e. it generates speech transcriptions and identifies main keywords and keyphrases, while also querying its index to display relevant documents without explicit query. Without user intervention, the results are dynamically updated; users can choose to interact with the system at any time, employing a conversation protocol that is enriched with the ambient information gathered continuously. Our evaluation shows that Ambient Search outperforms another implicit speech-based information retrieval system. Ambient search is available as open source software.

Co-authors

Marlo Haering 1

Jakob Smedegaard Andersen 1

Benjamin Milde

2022

2021

2016

Co-authors

Venues