Tobias Falke


2021

pdf bib
Multilingual Paraphrase Generation For Bootstrapping New Features in Task-Oriented Dialog Systems
Subhadarshi Panda | Caglar Tirkaz | Tobias Falke | Patrick Lehnen
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

The lack of labeled training data for new features is a common problem in rapidly changing real-world dialog systems. As a solution, we propose a multilingual paraphrase generation model that can be used to generate novel utterances for a target feature and target language. The generated utterances can be used to augment existing training data to improve intent classification and slot labeling models. We evaluate the quality of generated utterances using intrinsic evaluation metrics and by conducting downstream evaluation experiments with English as the source language and nine different target languages. Our method shows promise across languages, even in a zero-shot setting where no seed data is available.

pdf bib
Feedback Attribution for Counterfactual Bandit Learning in Multi-Domain Spoken Language Understanding
Tobias Falke | Patrick Lehnen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

With counterfactual bandit learning, models can be trained based on positive and negative feedback received for historical predictions, with no labeled data needed. Such feedback is often available in real-world dialog systems, however, the modularized architecture commonly used in large-scale systems prevents the direct application of such algorithms. In this paper, we study the feedback attribution problem that arises when using counterfactual bandit learning for multi-domain spoken language understanding. We introduce an experimental setup to simulate the problem on small-scale public datasets, propose attribution methods inspired by multi-agent reinforcement learning and evaluate them against multiple baselines. We find that while directly using overall feedback leads to disastrous performance, our proposed attribution methods can allow training competitive models from user feedback.

2020

pdf bib
Data-Efficient Paraphrase Generation to Bootstrap Intent Classification and Slot Labeling for New Features in Task-Oriented Dialog Systems
Shailza Jolly | Tobias Falke | Caglar Tirkaz | Daniil Sorokin
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track

Recent progress through advanced neural models pushed the performance of task-oriented dialog systems to almost perfect accuracy on existing benchmark datasets for intent classification and slot labeling. However, in evolving real-world dialog systems, where new functionality is regularly added, a major additional challenge is the lack of annotated training data for such new functionality, as the necessary data collection efforts are laborious and time-consuming. A potential solution to reduce the effort is to augment initial seed data by paraphrasing existing utterances automatically. In this paper, we propose a new, data-efficient approach following this idea. Using an interpretation-to-text model for paraphrase generation, we are able to rely on existing dialog system training data, and, in combination with shuffling-based sampling techniques, we can obtain diverse and novel paraphrases from small amounts of seed data. In experiments on a public dataset and with a real-world dialog system, we observe improvements for both intent classification and slot labeling, demonstrating the usefulness of our approach.

pdf bib
Leveraging User Paraphrasing Behavior In Dialog Systems To Automatically Collect Annotations For Long-Tail Utterances
Tobias Falke | Markus Boese | Daniil Sorokin | Caglar Tirkaz | Patrick Lehnen
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track

In large-scale commercial dialog systems, users express the same request in a wide variety of alternative ways with a long tail of less frequent alternatives. Handling the full range of this distribution is challenging, in particular when relying on manual annotations. However, the same users also provide useful implicit feedback as they often paraphrase an utterance if the dialog system failed to understand it. We propose MARUPA, a method to leverage this type of feedback by creating annotated training examples from it. MARUPA creates new data in a fully automatic way, without manual intervention or effort from annotators, and specifically for currently failing utterances. By re-training the dialog system on this new data, accuracy and coverage for long-tail utterances can be improved. In experiments, we study the effectiveness of this approach in a commercial dialog system across various domains and three languages.

2019

pdf bib
Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference
Tobias Falke | Leonardo F. R. Ribeiro | Prasetya Ajie Utama | Ido Dagan | Iryna Gurevych
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

While recent progress on abstractive summarization has led to remarkably fluent summaries, factual errors in generated summaries still severely limit their use in practice. In this paper, we evaluate summaries produced by state-of-the-art models via crowdsourcing and show that such errors occur frequently, in particular with more abstractive models. We study whether textual entailment predictions can be used to detect such errors and if they can be reduced by reranking alternative predicted summaries. That leads to an interesting downstream application for entailment models. In our experiments, we find that out-of-the-box entailment models trained on NLI datasets do not yet offer the desired performance for the downstream task and we therefore release our annotations as additional test data for future extrinsic evaluations of NLI.

pdf bib
Fast Concept Mention Grouping for Concept Map-based Multi-Document Summarization
Tobias Falke | Iryna Gurevych
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Concept map-based multi-document summarization has recently been proposed as a variant of the traditional summarization task with graph-structured summaries. As shown by previous work, the grouping of coreferent concept mentions across documents is a crucial subtask of it. However, while the current state-of-the-art method suggested a new grouping method that was shown to improve the summary quality, its use of pairwise comparisons leads to polynomial runtime complexity that prohibits the application to large document collections. In this paper, we propose two alternative grouping techniques based on locality sensitive hashing, approximate nearest neighbor search and a fast clustering algorithm. They exhibit linear and log-linear runtime complexity, making them much more scalable. We report experimental results that confirm the improved runtime behavior while also showing that the quality of the summary concept maps remains comparable.

2017

pdf bib
Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps
Tobias Falke | Iryna Gurevych
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Concept maps can be used to concisely represent important information and bring structure into large document collections. Therefore, we study a variant of multi-document summarization that produces summaries in the form of concept maps. However, suitable evaluation datasets for this task are currently missing. To close this gap, we present a newly created corpus of concept maps that summarize heterogeneous collections of web documents on educational topics. It was created using a novel crowdsourcing approach that allows us to efficiently determine important elements in large document collections. We release the corpus along with a baseline system and proposed evaluation protocol to enable further research on this variant of summarization.

pdf bib
GraphDocExplore: A Framework for the Experimental Comparison of Graph-based Document Exploration Techniques
Tobias Falke | Iryna Gurevych
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Graphs have long been proposed as a tool to browse and navigate in a collection of documents in order to support exploratory search. Many techniques to automatically extract different types of graphs, showing for example entities or concepts and different relationships between them, have been suggested. While experimental evidence that they are indeed helpful exists for some of them, it is largely unknown which type of graph is most helpful for a specific exploratory task. However, carrying out experimental comparisons with human subjects is challenging and time-consuming. Towards this end, we present the GraphDocExplore framework. It provides an intuitive web interface for graph-based document exploration that is optimized for experimental user studies. Through a generic graph interface, different methods to extract graphs from text can be plugged into the system. Hence, they can be compared at minimal implementation effort in an environment that ensures controlled comparisons. The system is publicly available under an open-source license.

pdf bib
Concept-Map-Based Multi-Document Summarization using Concept Coreference Resolution and Global Importance Optimization
Tobias Falke | Christian M. Meyer | Iryna Gurevych
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Concept-map-based multi-document summarization is a variant of traditional summarization that produces structured summaries in the form of concept maps. In this work, we propose a new model for the task that addresses several issues in previous methods. It learns to identify and merge coreferent concepts to reduce redundancy, determines their importance with a strong supervised model and finds an optimal summary concept map via integer linear programming. It is also computationally more efficient than previous methods, allowing us to summarize larger document sets. We evaluate the model on two datasets, finding that it outperforms several approaches from previous work.

pdf bib
Utilizing Automatic Predicate-Argument Analysis for Concept Map Mining
Tobias Falke | Iryna Gurevych
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers

2016

pdf bib
Porting an Open Information Extraction System from English to German
Tobias Falke | Gabriel Stanovsky | Iryna Gurevych | Ido Dagan
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing