Ngoc Thang Vu


Conversational Tree Search: A New Hybrid Dialog Task
Dirk Väth | Lindsey Vanderlyn | Ngoc Thang Vu
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Conversational interfaces provide a flexible and easy way for users to seek information that may otherwise be difficult or inconvenient to obtain. However, existing interfaces generally fall into one of two categories: FAQs, where users must have a concrete question in order to retrieve a general answer, or dialogs, where users must follow a pre-defined path but may receive a personalized answer. In this paper, we introduce Conversational Tree Search (CTS) as a new task that bridges the gap between FAQ-style information retrieval and task-oriented dialog, allowing domain-experts to define dialog trees which can then be converted to an efficient dialog policy that learns only to ask the questions necessary to navigate a user to their goal.We collect a dataset for the travel reimbursement domain and demonstrate a baseline as well as a novel deep Reinforcement Learning architecture for this task. Our results show that the new architecture combines the positive aspects of both the FAQ and dialog system used in the baseline and achieves higher goal completion while skipping unnecessary questions.

Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text
Marwa Gaser | Manuel Mager | Injy Hamed | Nizar Habash | Slim Abdennadher | Ngoc Thang Vu
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Data sparsity is one of the main challenges posed by code-switching (CS), which is further exacerbated in the case of morphologically rich languages. For the task of machine translation (MT), morphological segmentation has proven successful in alleviating data sparsity in monolingual contexts; however, it has not been investigated for CS settings. In this paper, we study the effectiveness of different segmentation approaches on MT performance, covering morphology-based and frequency-based segmentation techniques. We experiment on MT from code-switched Arabic-English to English. We provide detailed analysis, examining a variety of conditions, such as data size and sentences with different degrees of CS. Empirical results show that morphology-aware segmenters perform the best in segmentation tasks but under-perform in MT. Nevertheless, we find that the choice of the segmentation setup to use for MT is highly dependent on the data size. For extreme low-resource scenarios, a combination of frequency and morphology-based segmentations is shown to perform the best. For more resourced settings, such a combination does not bring significant improvements over the use of frequency-based segmentation.


»textklang« – Towards a Multi-Modal Exploration Platform for German Poetry
Nadja Schauffler | Toni Bernhart | Andre Blessing | Gunilla Eschenbach | Markus Gärtner | Kerstin Jung | Anna Kinder | Julia Koch | Sandra Richter | Gabriel Viehhauser | Ngoc Thang Vu | Lorenz Wesemann | Jonas Kuhn
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present the steps taken towards an exploration platform for a multi-modal corpus of German lyric poetry from the Romantic era developed in the project »textklang«. This interdisciplinary project develops a mixed-methods approach for the systematic investigation of the relationship between written text (here lyric poetry) and its potential and actual sonic realisation (in recitations, musical performances etc.). The multi-modal »textklang« platform will be designed to technically and analytically combine three modalities: the poetic text, the audio signal of a recorded recitation and, at a later stage, music scores of a musical setting of a poem. The methodological workflow will enable scholars to develop hypotheses about the relationship between textual form and sonic/prosodic realisation based on theoretical considerations, text interpretation and evidence from recorded recitations. The full workflow will support hypothesis testing either through systematic corpus analysis alone or with addtional contrastive perception experiments. For the experimental track, researchers will be enabled to manipulate prosodic parameters in (re-)synthesised variants of the original recordings. The focus of this paper is on the design of the base corpus and on tools for systematic exploration – placing special emphasis on our response to challenges stemming from multi-modality and the methodologically diverse interdisciplinary setup.

Bias Identification and Attribution in NLP Models With Regression and Effect Sizes
Erenay Dayanik | Ngoc Thang Vu | Sebastian Padó
Northern European Journal of Language Technology, Volume 8

In recent years, there has been an increasing awareness that many NLP systems incorporate biases of various types (e.g., regarding gender or race) which can have significant negative consequences. At the same time, the techniques used to statistically analyze such biases are still relatively simple. Typically, studies test for the presence of a significant difference between two levels of a single bias variable (e.g., male vs. female) without attention to potential confounders, and do not quantify the importance of the bias variable. This article proposes to analyze bias in the output of NLP systems using multivariate regression models. They provide a robust and more informative alternative which (a) generalizes to multiple bias variables, (b) can take covariates into account, (c) can be combined with measures of effect size to quantify the size of bias. Jointly, these effects contribute to a more robust statistical analysis of bias that can be used to diagnose system behavior and extract informative examples. We demonstrate the benefits of our method by analyzing a range of current NLP models on one regression and one classification tasks (emotion intensity prediction and coreference resolution, respectively).

ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic-English
Injy Hamed | Nizar Habash | Slim Abdennadher | Ngoc Thang Vu
Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP)

We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic-English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus. We make the translation guidelines and corpus publicly available. We also report results for baseline systems for machine translation and speech translation tasks. We believe this is a valuable resource that can motivate and facilitate further research studying the code-switching phenomenon from a linguistic perspective and can be used to train and evaluate NLP systems.

Toward Implicit Reference in Dialog: A Survey of Methods and Data
Lindsey Vanderlyn | Talita Anthonio | Daniel Ortega | Michael Roth | Ngoc Thang Vu
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Communicating efficiently in natural language requires that we often leave information implicit, especially in spontaneous speech. This frequently results in phenomena of incompleteness, such as omitted references, that pose challenges for language processing. In this survey paper, we review the state of the art in research regarding the automatic processing of such implicit references in dialog scenarios, discuss weaknesses with respect to inconsistencies in task definitions and terminologies, and outline directions for future work. Among others, these include a unification of existing tasks, addressing data scarcity, and taking into account model and annotator uncertainties.

Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Florian Lux | Julia Koch | Ngoc Thang Vu
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While neural methods for text-to-speech (TTS) have shown great advances in modeling multiple speakers, even in zero-shot settings, the amount of data needed for those approaches is generally not feasible for the vast majority of the world’s over 6,000 spoken languages. In this work, we bring together the tasks of zero-shot voice cloning and multilingual low-resource TTS. Using the language agnostic meta learning (LAML) procedure and modifications to a TTS encoder, we show that it is possible for a system to learn speaking a new language using just 5 minutes of training data while retaining the ability to infer the voice of even unseen speakers in the newly learned language. We show the success of our proposed approach in terms of intelligibility, naturalness and similarity to target speaker using objective metrics as well as human studies and provide our code and trained models open source.


IMS’ Systems for the IWSLT 2021 Low-Resource Speech Translation Task
Pavel Denisov | Manuel Mager | Ngoc Thang Vu
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team. We utilize state-of-the-art models combined with several data augmentation, multi-task and transfer learning approaches for the automatic speech recognition (ASR) and machine translation (MT) steps of our cascaded system. Moreover, we also explore the feasibility of a full end-to-end speech translation (ST) model in the case of very constrained amount of ground truth labeled data. Our best system achieves the best performance among all submitted systems for Congolese Swahili to English and French with BLEU scores 7.7 and 13.7 respectively, and the second best result for Coastal Swahili to English with BLEU score 14.9.

Meta Learning and Its Applications to Natural Language Processing
Hung-yi Lee | Ngoc Thang Vu | Shang-Wen Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Tutorial Abstracts

Deep learning based natural language processing (NLP) has become the mainstream of research in recent years and significantly outperforms conventional methods. However, deep learning models are notorious for being data and computation hungry. These downsides limit the application of such models from deployment to different domains, languages, countries, or styles, since collecting in-genre data and model training from scratch are costly. The long-tail nature of human language makes challenges even more significant. Meta-learning, or ‘Learning to Learn’, aims to learn better learning algorithms, including better parameter initialization, optimization strategy, network architecture, distance metrics, and beyond. Meta-learning has been shown to allow faster fine-tuning, converge to better performance, and achieve amazing results for few-shot learning in many applications. Meta-learning is one of the most important new techniques in machine learning in recent years. There is a related tutorial in ICML 2019 and a related course at Stanford, but most of the example applications given in these materials are about image processing. It is believed that meta-learning has great potential to be applied in NLP, and some works have been proposed with notable achievements in several relevant problems, e.g., relation extraction, machine translation, and dialogue generation and state tracking. However, it does not catch the same level of attention as in the image processing community. In the tutorial, we will first introduce Meta-learning approaches and the theory behind them, and then review the works of applying this technology to NLP problems. This tutorial intends to facilitate researchers in the NLP community to understand this new technology better and promote more research studies using this new technology.

Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings
Hendrik Schuff | Hsiu-Yu Yang | Heike Adel | Ngoc Thang Vu
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Natural language inference (NLI) requires models to learn and apply commonsense knowledge. These reasoning abilities are particularly important for explainable NLI systems that generate a natural language explanation in addition to their label prediction. The integration of external knowledge has been shown to improve NLI systems, here we investigate whether it can also improve their explanation capabilities. For this, we investigate different sources of external knowledge and evaluate the performance of our models on in-domain data as well as on special transfer datasets that are designed to assess fine-grained reasoning capabilities. We find that different sources of knowledge have a different effect on reasoning abilities, for example, implicit knowledge stored in language models can hinder reasoning on numbers and negations. Finally, we conduct the largest and most fine-grained explainable NLI crowdsourcing study to date. It reveals that even large differences in automatic performance scores do neither reflect in human ratings of label, explanation, commonsense nor grammar correctness.

Few-shot Learning for Slot Tagging with Attentive Relational Network
Cennet Oguz | Ngoc Thang Vu
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Metric-based learning is a well-known family of methods for few-shot learning, especially in computer vision. Recently, they have been used in many natural language processing applications but not for slot tagging. In this paper, we explore metric-based learning methods in the slot tagging task and propose a novel metric-based learning architecture - Attentive Relational Network. Our proposed method extends relation networks, making them more suitable for natural language processing applications in general, by leveraging pretrained contextual embeddings such as ELMO and BERT and by using attention mechanism. The results on SNIPS data show that our proposed method outperforms other state of the art metric-based learning methods.

pdf bib
“It’s our fault!”: Insights Into Users’ Understanding and Interaction With an Explanatory Collaborative Dialog System
Katharina Weitz | Lindsey Vanderlyn | Ngoc Thang Vu | Elisabeth André
Proceedings of the 25th Conference on Computational Natural Language Learning

Human-AI collaboration, a long standing goal in AI, refers to a partnership where a human and artificial intelligence work together towards a shared goal. Collaborative dialog allows human-AI teams to communicate and leverage strengths from both partners. To design collaborative dialog systems, it is important to understand what mental models users form about their AI-dialog partners, however, how users perceive these systems is not fully understood. In this study, we designed a novel, collaborative, communication-based puzzle game and explanatory dialog system. We created a public corpus from 117 conversations and post-surveys and used this to analyze what mental models users formed. Key takeaways include: Even when users were not engaged in the game, they perceived the AI-dialog partner as intelligent and likeable, implying they saw it as a partner separate from the game. This was further supported by users often overestimating the system’s abilities and projecting human-like attributes which led to miscommunications. We conclude that creating shared mental models between users and AI systems is important to achieving successful dialogs. We propose that our insights on mental models and miscommunication, the game, and our corpus provide useful tools for designing collaborative dialog systems.

“It seemed like an annoying woman”: On the Perception and Ethical Considerations of Affective Language in Text-Based Conversational Agents
Lindsey Vanderlyn | Gianna Weber | Michael Neumann | Dirk Väth | Sarina Meyer | Ngoc Thang Vu
Proceedings of the 25th Conference on Computational Natural Language Learning

Previous research has found that task-oriented conversational agents are perceived more positively by users when they provide information in an empathetic manner compared to a plain, emotionless information exchange. However, users’ perception and ethical considerations related to a dialog systems’ response language style have received comparatively little attention in the field of human-computer interaction. To bridge this gap, we explored these ethical implications through a scenario-based user study. 127 participants interacted with one of three variants of an affective, task-oriented conversational agent, each variant providing responses in a different language style. After the interaction, participants filled out a survey about their feelings during the experiment and their perception of various aspects of the chatbot. Based on statistical and qualitative analysis of the responses, we found language style played an important role in how human-like participants perceived a dialog agent as well as how likable. Language style also had a direct effect on how users perceived the use of personal pronouns ‘I’ and ‘You’ and how they projected gender onto the chatbot. Finally, we identify and discuss ethical implications. In particular we focus on what factors/stereotypes influenced participants’ impressions of gender, and what trade-offs a more human-like chatbot brings.

Findings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas
Manuel Mager | Arturo Oncevay | Abteen Ebrahimi | John Ortega | Annette Rios | Angela Fan | Ximena Gutierrez-Vasques | Luis Chiruzzo | Gustavo Giménez-Lugo | Ricardo Ramos | Ivan Vladimir Meza Ruiz | Rolando Coto-Solano | Alexis Palmer | Elisabeth Mager-Hois | Vishrav Chaudhary | Graham Neubig | Ngoc Thang Vu | Katharina Kann
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas

This paper presents the results of the 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas. The shared task featured two independent tracks, and participants submitted machine translation systems for up to 10 indigenous languages. Overall, 8 teams participated with a total of 214 submissions. We provided training sets consisting of data collected from various sources, as well as manually translated sentences for the development and test sets. An official baseline trained on this data was also provided. Team submissions featured a variety of architectures, including both statistical and neural models, and for the majority of languages, many teams were able to considerably improve over the baseline. The best performing systems achieved 12.97 ChrF higher than baseline, when averaged across languages.

pdf bib
Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing
Hung-Yi Lee | Mitra Mohtarami | Shang-Wen Li | Di Jin | Mandy Korpusik | Shuyan Dong | Ngoc Thang Vu | Dilek Hakkani-Tur
Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing

Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking
Dirk Väth | Pascal Tilli | Ngoc Thang Vu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

On the way towards general Visual Question Answering (VQA) systems that are able to answer arbitrary questions, the need arises for evaluation beyond single-metric leaderboards for specific datasets. To this end, we propose a browser-based benchmarking tool for researchers and challenge organizers, with an API for easy integration of new models and datasets to keep up with the fast-changing landscape of VQA. Our tool helps test generalization capabilities of models across multiple datasets, evaluating not just accuracy, but also performance in more realistic real-world scenarios such as robustness to input noise. Additionally, we include metrics that measure biases and uncertainty, to further explain model behavior. Interactive filtering facilitates discovery of problematic behavior, down to the data sample level. As proof of concept, we perform a case study on four models. We find that state-of-the-art VQA models are optimized for specific tasks or datasets, but fail to generalize even to other in-domain test sets, for example they can not recognize text in images. Our metrics allow us to quantify which image and question embeddings provide most robustness to a model. All code s publicly available.


Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning
Daniel Grießhaber | Johannes Maucher | Ngoc Thang Vu
Proceedings of the 28th International Conference on Computational Linguistics

Recently, leveraging pre-trained Transformer based language models in down stream, task specific models has advanced state of the art results in natural language understanding tasks. However, only a little research has explored the suitability of this approach in low resource settings with less than 1,000 training data points. In this work, we explore fine-tuning methods of BERT - a pre-trained Transformer based language model - by utilizing pool-based active learning to speed up training while keeping the cost of labeling new data constant. Our experimental results on the GLUE data set show an advantage in model performance by maximizing the approximate knowledge gain of the model when querying from the pool of unlabeled data. Finally, we demonstrate and analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters, making it more suitable for low-resource settings.

IMSurReal Too: IMS in the Surface Realization Shared Task 2020
Xiang Yu | Simon Tannert | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the Third Workshop on Multilingual Surface Realisation

We introduce the IMS contribution to the Surface Realization Shared Task 2020. The new system achieves substantial improvement over the state-of-the-art system from last year, mainly due to a better token representation and a better linearizer, as well as a simple ensembling approach. We also experiment with data augmentation, which brings some additional performance gain. The system is available at

Cairo Student Code-Switch (CSCS) Corpus: An Annotated Egyptian Arabic-English Corpus
Mohamed Balabel | Injy Hamed | Slim Abdennadher | Ngoc Thang Vu | Özlem Çetinoğlu
Proceedings of the Twelfth Language Resources and Evaluation Conference

Code-switching has become a prevalent phenomenon across many communities. It poses a challenge to NLP researchers, mainly due to the lack of available data needed for training and testing applications. In this paper, we introduce a new resource: a corpus of Egyptian- Arabic code-switch speech data that is fully tokenized, lemmatized and annotated for part-of-speech tags. Beside the corpus itself, we provide annotation guidelines to address the unique challenges of annotating code-switch data. Another challenge that we address is the fact that Egyptian Arabic orthography and grammar are not standardized.

ArzEn: A Speech Corpus for Code-switched Egyptian Arabic-English
Injy Hamed | Ngoc Thang Vu | Slim Abdennadher
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we present our ArzEn corpus, an Egyptian Arabic-English code-switching (CS) spontaneous speech corpus. The corpus is collected through informal interviews with 38 Egyptian bilingual university students and employees held in a soundproof room. A total of 12 hours are recorded, transcribed, validated and sentence segmented. The corpus is mainly designed to be used in Automatic Speech Recognition (ASR) systems, however, it also provides a useful resource for analyzing the CS phenomenon from linguistic, sociological, and psychological perspectives. In this paper, we first discuss the CS phenomenon in Egypt and the factors that gave rise to the current language. We then provide a detailed description on how the corpus was collected, giving an overview on the participants involved. We also present statistics on the CS involved in the corpus, as well as a summary to the effort exerted in the corpus development, in terms of number of hours required for transcription, validation, segmentation and speaker annotation. Finally, we discuss some factors contributing to the complexity of the corpus, as well as Arabic-English CS behaviour that could pose potential challenges to ASR systems.

A Two-stage Model for Slot Filling in Low-resource Settings: Domain-agnostic Non-slot Reduction and Pretrained Contextual Embeddings
Cennet Oguz | Ngoc Thang Vu
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

Learning-based slot filling - a key component of spoken language understanding systems - typically requires a large amount of in-domain hand-labeled data for training. In this paper, we propose a novel two-stage model architecture that can be trained with only a few in-domain hand-labeled examples. The first step is designed to remove non-slot tokens (i.e., O labeled tokens), as they introduce noise in the input of slot filling models. This step is domain-agnostic and therefore, can be trained by exploiting out-of-domain data. The second step identifies slot names only for slot tokens by using state-of-the-art pretrained contextual embeddings such as ELMO and BERT. We show that our approach outperforms other state-of-art systems on the SNIPS benchmark dataset.

Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection
Xiang Yu | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

We present an iterative data augmentation framework, which trains and searches for an optimal ensemble and simultaneously annotates new training data in a self-training style. We apply this framework on two SIGMORPHON 2020 shared tasks: grapheme-to-phoneme conversion and morphological inflection. With very simple base models in the ensemble, we rank the first and the fourth in these two tasks. We show in the analysis that our system works especially well on low-resource languages.

F1 is Not Enough! Models and Evaluation Towards User-Centered Explainable Question Answering
Hendrik Schuff | Heike Adel | Ngoc Thang Vu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Explainable question answering systems predict an answer together with an explanation showing why the answer has been selected. The goal is to enable users to assess the correctness of the system and understand its reasoning process. However, we show that current models and evaluation settings have shortcomings regarding the coupling of answer and explanation which might cause serious issues in user experience. As a remedy, we propose a hierarchical model and a new regularization term to strengthen the answer-explanation coupling as well as two evaluation scores to quantify the coupling. We conduct experiments on the HOTPOTQA benchmark data set and perform a user study. The user study shows that our models increase the ability of the users to judge the correctness of the system and that scores like F1 are not enough to estimate the usefulness of a model in a practical setting with human users. Our scores are better aligned with user experience, making them promising candidates for model selection.

Fast and Accurate Non-Projective Dependency Tree Linearization
Xiang Yu | Simon Tannert | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose a graph-based method to tackle the dependency tree linearization task. We formulate the task as a Traveling Salesman Problem (TSP), and use a biaffine attention model to calculate the edge costs. We facilitate the decoding by solving the TSP for each subtree and combining the solution into a projective tree. We then design a transition system as post-processing, inspired by non-projective transition-based parsing, to obtain non-projective sentences. Our proposed method outperforms the state-of-the-art linearizer while being 10 times faster in training and decoding.

ADVISER: A Toolkit for Developing Multi-modal, Multi-domain and Socially-engaged Conversational Agents
Chia-Yu Li | Daniel Ortega | Dirk Väth | Florian Lux | Lindsey Vanderlyn | Maximilian Schmidt | Michael Neumann | Moritz Völkel | Pavel Denisov | Sabrina Jenne | Zorica Kacarevic | Ngoc Thang Vu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e.g. emotion recognition, engagement level prediction and backchanneling) conversational agents. The final Python-based implementation of our toolkit is flexible, easy to use, and easy to extend not only for technically experienced users, such as machine learning researchers, but also for less technically experienced users, such as linguists or cognitive scientists, thereby providing a flexible platform for collaborative research.

pdf bib
Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension
Ekta Sood | Simon Tannert | Diego Frassinelli | Andreas Bulling | Ngoc Thang Vu
Proceedings of the 24th Conference on Computational Natural Language Learning

While neural networks with attention mechanisms have achieved superior performance on many natural language processing tasks, it remains unclear to which extent learned attention resembles human visual attention. In this paper, we propose a new method that leverages eye-tracking data to investigate the relationship between human visual attention and neural attention in machine reading comprehension. To this end, we introduce a novel 23 participant eye tracking dataset - MQA-RC, in which participants read movie plots and answered pre-defined questions. We compare state of the art networks based on long short-term memory (LSTM), convolutional neural models (CNN) and XLNet Transformer architectures. We find that higher similarity to human attention and performance significantly correlates to the LSTM and CNN models. However, we show this relationship does not hold true for the XLNet models – despite the fact that the XLNet performs best on this challenging task. Our results suggest that different architectures seem to learn rather different neural attention strategies and similarity of neural to human attention does not guarantee best performance.


IMSurReal: IMS at the Surface Realization Shared Task 2019
Xiang Yu | Agnieszka Falenska | Marina Haid | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

We introduce the IMS contribution to the Surface Realization Shared Task 2019. Our submission achieves the state-of-the-art performance without using any external resources. The system takes a pipeline approach consisting of five steps: linearization, completion, inflection, contraction, and detokenization. We compare the performance of our linearization algorithm with two external baselines and report results for each step in the pipeline. Furthermore, we perform detailed error analysis revealing correlation between word order freedom and difficulty of the linearization task.

ADVISER: A Dialog System Framework for Education & Research
Daniel Ortega | Dirk Väth | Gianna Weber | Lindsey Vanderlyn | Maximilian Schmidt | Moritz Völkel | Zorica Karacevic | Ngoc Thang Vu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

In this paper, we present ADVISER - an open source dialog system framework for education and research purposes. This system supports multi-domain task-oriented conversations in two languages. It additionally provides a flexible architecture in which modules can be arbitrarily combined or exchanged - allowing for easy switching between rules-based and neural network based implementations. Furthermore, ADVISER offers a transparent, user-friendly framework designed for interdisciplinary collaboration: from a flexible back end, allowing easy integration of new features, to an intuitive graphical user interface supporting nontechnical users.

Learning the Dyck Language with Attention-based Seq2Seq Models
Xiang Yu | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

The generalized Dyck language has been used to analyze the ability of Recurrent Neural Networks (RNNs) to learn context-free grammars (CFGs). Recent studies draw conflicting conclusions on their performance, especially regarding the generalizability of the models with respect to the depth of recursion. In this paper, we revisit several common models and experimental settings, discuss the potential problems of the tasks and analyses. Furthermore, we explore the use of attention mechanisms within the seq2seq framework to learn the Dyck language, which could compensate for the limited encoding ability of RNNs. Our findings reveal that attention mechanisms still cannot truly generalize over the recursion depth, although they perform much better than other models on the closing bracket tagging task. Moreover, this also suggests that this commonly used task is not sufficient to test a model’s understanding of CFGs.

To Combine or Not To Combine? A Rainbow Deep Reinforcement Learning Agent for Dialog Policies
Dirk Väth | Ngoc Thang Vu
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

In this paper, we explore state-of-the-art deep reinforcement learning methods for dialog policy training such as prioritized experience replay, double deep Q-Networks, dueling network architectures and distributional learning. Our main findings show that each individual method improves the rewards and the task success rate but combining these methods in a Rainbow agent, which performs best across tasks and environments, is a non-trivial task. We, therefore, provide insights about the influence of each method on the combination and how to combine them to form a Rainbow agent.

Head-First Linearization with Tree-Structured Representation
Xiang Yu | Agnieszka Falenska | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 12th International Conference on Natural Language Generation

We present a dependency tree linearization model with two novel components: (1) a tree-structured encoder based on bidirectional Tree-LSTM that propagates information first bottom-up then top-down, which allows each token to access information from the entire tree; and (2) a linguistically motivated head-first decoder that emphasizes the central role of the head and linearizes the subtree by incrementally attaching the dependents on both sides of the head. With the new encoder and decoder, we reach state-of-the-art performance on the Surface Realization Shared Task 2018 dataset, outperforming not only the shared tasks participants, but also previous state-of-the-art systems (Bohnet et al., 2011; Puduppully et al., 2016). Furthermore, we analyze the power of the tree-structured encoder with a probing task and show that it is able to recognize the topological relation between any pair of tokens in a tree.


Addressing Low-Resource Scenarios with Character-aware Embeddings
Sean Papay | Sebastian Padó | Ngoc Thang Vu
Proceedings of the Second Workshop on Subword/Character LEvel Models

Most modern approaches to computing word embeddings assume the availability of text corpora with billions of words. In this paper, we explore a setup where only corpora with millions of words are available, and many words in any new text are out of vocabulary. This setup is both of practical interests – modeling the situation for specific domains and low-resource languages – and of psycholinguistic interest, since it corresponds much more closely to the actual experiences and challenges of human language learning and use. We compare standard skip-gram word embeddings with character-based embeddings on word relatedness prediction. Skip-grams excel on large corpora, while character-based embeddings do well on small corpora generally and rare and complex words specifically. The models can be combined easily.

Approximate Dynamic Oracle for Dependency Parsing with Reinforcement Learning
Xiang Yu | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

We present a general approach with reinforcement learning (RL) to approximate dynamic oracles for transition systems where exact dynamic oracles are difficult to derive. We treat oracle parsing as a reinforcement learning problem, design the reward function inspired by the classical dynamic oracle, and use Deep Q-Learning (DQN) techniques to train the oracle with gold trees as features. The combination of a priori knowledge and data-driven methods enables an efficient dynamic oracle, which improves the parser performance over static oracles in several transition systems.

Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity
Glorianna Jagfeld | Sabrina Jenne | Ngoc Thang Vu
Proceedings of the 11th International Conference on Natural Language Generation

We present a comparison of word-based and character-based sequence-to-sequence models for data-to-text natural language generation, which generate natural language descriptions for structured inputs. On the datasets of two recent generation challenges, our models achieve comparable or better automatic evaluation results than the best challenge submissions. Subsequent detailed statistical and human analyses shed light on the differences between the two input representations and the diversity of the generated texts. In a controlled experiment with synthetic training data generated from templates, we demonstrate the ability of neural models to learn novel combinations of the templates and thereby generalize beyond the linguistic structures they were trained on.

Introducing Two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness
Kim Anh Nguyen | Sabine Schulte im Walde | Ngoc Thang Vu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We present two novel datasets for the low-resource language Vietnamese to assess models of semantic similarity: ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity. ViSim-400 provides degrees of similarity across five semantic relations, as rated by human judges. The two datasets are verified through standard co-occurrence and neural network models, showing results comparable to the respective English datasets.

Comparing Attention-Based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension
Matthias Blohm | Glorianna Jagfeld | Ekta Sood | Xiang Yu | Ngoc Thang Vu
Proceedings of the 22nd Conference on Computational Natural Language Learning

We propose a machine reading comprehension model based on the compare-aggregate framework with two-staged attention that achieves state-of-the-art results on the MovieQA question answering dataset. To investigate the limitations of our model as well as the behavioral difference between convolutional and recurrent neural networks, we generate adversarial examples to confuse the model and compare to human performance. Furthermore, we assess the generalizability of our model by analyzing its differences to human inference, drawing upon insights from cognitive science.


A General-Purpose Tagger with Convolutional Neural Networks
Xiang Yu | Agnieszka Falenska | Ngoc Thang Vu
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We present a general-purpose tagger based on convolutional neural networks (CNN), used for both composing word vectors and encoding context information. The CNN tagger is robust across different tagging tasks: without task-specific tuning of hyper-parameters, it achieves state-of-the-art results in part-of-speech tagging, morphological tagging and supertagging. The CNN tagger is also robust against the out-of-vocabulary problem; it performs well on artificially unnormalized texts.

pdf bib
Encoding Word Confusion Networks with Recurrent Neural Networks for Dialog State Tracking
Glorianna Jagfeld | Ngoc Thang Vu
Proceedings of the Workshop on Speech-Centric Natural Language Processing

This paper presents our novel method to encode word confusion networks, which can represent a rich hypothesis space of automatic speech recognition systems, via recurrent neural networks. We demonstrate the utility of our approach for the task of dialog state tracking in spoken dialog systems that relies on automatic speech recognition output. Encoding confusion networks outperforms encoding the best hypothesis of the automatic speech recognition in a neural system for dialog state tracking on the well-known second Dialog State Tracking Challenge dataset.

Enriching ASR Lattices with POS Tags for Dependency Parsing
Moritz Stiefel | Ngoc Thang Vu
Proceedings of the Workshop on Speech-Centric Natural Language Processing

Parsing speech requires a richer representation than 1-best or n-best hypotheses, e.g. lattices. Moreover, previous work shows that part-of-speech (POS) tags are a valuable resource for parsing. In this paper, we therefore explore a joint modeling approach of automatic speech recognition (ASR) and POS tagging to enrich ASR word lattices. To that end, we manipulate the ASR process from the pronouncing dictionary onward to use word-POS pairs instead of words. We evaluate ASR, POS tagging and dependency parsing (DP) performance demonstrating a successful lattice-based integration of ASR and POS tagging.

Improving coreference resolution with automatically predicted prosodic information
Ina Roesiger | Sabrina Stehwien | Arndt Riester | Ngoc Thang Vu
Proceedings of the Workshop on Speech-Centric Natural Language Processing

Adding manually annotated prosodic information, specifically pitch accents and phrasing, to the typical text-based feature set for coreference resolution has previously been shown to have a positive effect on German data. Practical applications on spoken language, however, would rely on automatically predicted prosodic information. In this paper we predict pitch accents (and phrase boundaries) using a convolutional neural network (CNN) model from acoustic features extracted from the speech signal. After an assessment of the quality of these automatic prosodic annotations, we show that they also significantly improve coreference resolution.

Neural-based Context Representation Learning for Dialog Act Classification
Daniel Ortega | Ngoc Thang Vu
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

We explore context representation learning methods in neural-based models for dialog act classification. We propose and compare extensively different methods which combine recurrent neural network architectures and attention mechanisms (AMs) at different context levels. Our experimental results on two benchmark datasets show consistent improvements compared to the models without contextual information and reveal that the most suitable AM in the architecture depends on the nature of the dataset.

Distinguishing Antonyms and Synonyms in a Pattern-based Neural Network
Kim Anh Nguyen | Sabine Schulte im Walde | Ngoc Thang Vu
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Distinguishing between antonyms and synonyms is a key task to achieve high performance in NLP systems. While they are notoriously difficult to distinguish by distributional co-occurrence models, pattern-based methods have proven effective to differentiate between the relations. In this paper, we present a novel neural network model AntSynNET that exploits lexico-syntactic patterns from syntactic parse trees. In addition to the lexical and syntactic information, we successfully integrate the distance between the related words along the syntactic path as a new pattern feature. The results from classification experiments show that AntSynNET improves the performance over prior pattern-based methods.

Character Composition Model with Convolutional Neural Networks for Dependency Parsing on Morphologically Rich Languages
Xiang Yu | Ngoc Thang Vu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present a transition-based dependency parser that uses a convolutional neural network to compose word representations from characters. The character composition model shows great improvement over the word-lookup model, especially for parsing agglutinative languages. These improvements are even better than using pre-trained word embeddings from extra data. On the SPMRL data sets, our system outperforms the previous best greedy parser (Ballesteros et. al, 2015) by a margin of 3% on average.

Hierarchical Embeddings for Hypernymy Detection and Directionality
Kim Anh Nguyen | Maximilian Köper | Sabine Schulte im Walde | Ngoc Thang Vu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a novel neural model HyperVec to learn hierarchical embeddings for hypernymy detection and directionality. While previous embeddings have shown limitations on prototypical hypernyms, HyperVec represents an unsupervised measure where embeddings are learned in a specific order and capture the hypernym–hyponym distributional hierarchy. Moreover, our model is able to generalize over unseen hypernymy pairs, when using only small sets of training data, and by mapping to other languages. Results on benchmark datasets show that HyperVec outperforms both state-of-the-art unsupervised measures and embedding models on hypernymy detection and directionality, and on predicting graded lexical entailment.


Combining Recurrent and Convolutional Neural Networks for Relation Classification
Ngoc Thang Vu | Heike Adel | Pankaj Gupta | Hinrich Schütze
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Towards a text analysis system for political debates
Dieu-Thu Le | Ngoc Thang Vu | Andre Blessing
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Challenges of Computational Processing of Code-Switching
Özlem Çetinoğlu | Sarah Schulz | Ngoc Thang Vu
Proceedings of the Second Workshop on Computational Approaches to Code Switching

Neural-based Noise Filtering from Word Embeddings
Kim Anh Nguyen | Sabine Schulte im Walde | Ngoc Thang Vu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Word embeddings have been demonstrated to benefit NLP tasks impressively. Yet, there is room for improvements in the vector representations, because current word embeddings typically contain unnecessary information, i.e., noise. We propose two novel models to improve word embeddings by unsupervised learning, in order to yield word denoising embeddings. The word denoising embeddings are obtained by strengthening salient information and weakening noise in the original word embeddings, based on a deep feed-forward neural network filter. Results from benchmark tasks show that the filtered word denoising embeddings outperform the original word embeddings.

Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction
Kim Anh Nguyen | Sabine Schulte im Walde | Ngoc Thang Vu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


A Linguistically Informed Convolutional Neural Network
Sebastian Ebert | Ngoc Thang Vu | Hinrich Schütze
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

CIS-positive: A Combination of Convolutional Neural Networks and Support Vector Machines for Sentiment Analysis in Twitter
Sebastian Ebert | Ngoc Thang Vu | Hinrich Schütze
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)


Exploration of the Impact of Maximum Entropy in Recurrent Neural Network Language Models for Code-Switching Speech
Ngoc Thang Vu | Tanja Schultz
Proceedings of the First Workshop on Computational Approaches to Code Switching


Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling
Heike Adel | Ngoc Thang Vu | Tanja Schultz
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


Speech recognition for machine translation in Quaero
Lori Lamel | Sandrine Courcinous | Julien Despres | Jean-Luc Gauvain | Yvan Josse | Kevin Kilgour | Florian Kraft | Viet-Bac Le | Hermann Ney | Markus Nußbaum-Thom | Ilya Oparin | Tim Schlippe | Ralf Schlüter | Tanja Schultz | Thiago Fraga da Silva | Sebastian Stüker | Martin Sundermeyer | Bianca Vieru | Ngoc Thang Vu | Alexander Waibel | Cécile Woehrling
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the speech-to-text systems used to provide automatic transcriptions used in the Quaero 2010 evaluation of Machine Translation from speech. Quaero ( is a large research and industrial innovation program focusing on technologies for automatic analysis and classification of multimedia and multilingual documents. The ASR transcript is the result of a Rover combination of systems from three teams ( KIT, RWTH, LIMSI+VR) for the French and German languages. The casesensitive word error rates (WER) of the combined systems were respectively 20.8% and 18.1% on the 2010 evaluation data, relative WER reductions of 14.6% and 17.4% respectively over the best component system.