Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

Sebastian Padó, Ruihong Huang (Editors)

Anthology ID:
Hong Kong, China
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations
Sebastian Padó | Ruihong Huang

pdf bib
ABSApp: A Portable Weakly-Supervised Aspect-Based Sentiment Extraction System
Oren Pereg | Daniel Korat | Moshe Wasserblat | Jonathan Mamou | Ido Dagan

We present ABSApp, a portable system for weakly-supervised aspect-based sentiment ex- traction. The system is interpretable and user friendly and does not require labeled training data, hence can be rapidly and cost-effectively used across different domains in applied setups. The system flow includes three stages: First, it generates domain-specific aspect and opinion lexicons based on an unlabeled dataset; second, it enables the user to view and edit those lexicons (weak supervision); and finally, it enables the user to select an unlabeled target dataset from the same domain, classify it, and generate an aspect-based sentiment report. ABSApp has been successfully used in a number of real-life use cases, among them movie review analysis and convention impact analysis.

pdf bib
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
Eric Wallace | Jens Tuyls | Junlin Wang | Sanjay Subramanian | Matt Gardner | Sameer Singh

Neural NLP models are increasingly accurate but are imperfect and opaque—they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for specific model predictions. Unfortunately, existing interpretation codebases make it difficult to apply these methods to new models and tasks, which hinders adoption for practitioners and burdens interpretability researchers. We introduce AllenNLP Interpret, a flexible framework for interpreting NLP models. The toolkit provides interpretation primitives (e.g., input gradients) for any AllenNLP model and task, a suite of built-in interpretation methods, and a library of front-end visualization components. We demonstrate the toolkit’s flexibility and utility by implementing live demos for five interpretation methods (e.g., saliency maps and adversarial attacks) on a variety of models and tasks (e.g., masked language modeling using BERT and reading comprehension using BiDAF). These demos, alongside our code and tutorials, are available at

ALTER: Auxiliary Text Rewriting Tool for Natural Language Generation
Qiongkai Xu | Chenchen Xu | Lizhen Qu

In this paper, we describe ALTER, an auxiliary text rewriting tool that facilitates the rewriting process for natural language generation tasks, such as paraphrasing, text simplification, fairness-aware text rewriting, and text style transfer. Our tool is characterized by two features, i) recording of word-level revision histories and ii) flexible auxiliary edit support and feedback to annotators. The text rewriting assist and traceable rewriting history are potentially beneficial to the future research of natural language generation.

Applying BERT to Document Retrieval with Birch
Zeynep Akkalyoncu Yilmaz | Shengjin Wang | Wei Yang | Haotian Zhang | Jimmy Lin

We present Birch, a system that applies BERT to document retrieval via integration with the open-source Anserini information retrieval toolkit to demonstrate end-to-end search over large document collections. Birch implements simple ranking models that achieve state-of-the-art effectiveness on standard TREC newswire and social media test collections. This demonstration focuses on technical challenges in the integration of NLP and IR capabilities, along with the design rationale behind our approach to tightly-coupled integration between Python (to support neural networks) and the Java Virtual Machine (to support document retrieval using the open-source Lucene search library). We demonstrate integration of Birch with an existing search interface as well as interactive notebooks that highlight its capabilities in an easy-to-understand manner.

Automatic Taxonomy Induction and Expansion
Nicolas Rodolfo Fauceglia | Alfio Gliozzo | Sarthak Dash | Md. Faisal Mahbub Chowdhury | Nandana Mihindukulasooriya

The Knowledge Graph Induction Service (KGIS) is an end-to-end knowledge induction system. One of its main capabilities is to automatically induce taxonomies from input documents using a hybrid approach that takes advantage of linguistic patterns, semantic web and neural networks. KGIS allows the user to semi-automatically curate and expand the induced taxonomy through a component called Smart SpreadSheet by exploiting distributional semantics. In this paper, we describe these taxonomy induction and expansion features of KGIS. A screencast video demonstrating the system is available in .

CFO: A Framework for Building Production NLP Systems
Rishav Chakravarti | Cezar Pendus | Andrzej Sakrajda | Anthony Ferritto | Lin Pan | Michael Glass | Vittorio Castelli | J. William Murdock | Radu Florian | Salim Roukos | Avi Sil

This paper introduces a novel orchestration framework, called CFO (Computation Flow Orchestrator), for building, experimenting with, and deploying interactive NLP (Natural Language Processing) and IR (Information Retrieval) systems to production environments. We then demonstrate a question answering system built using this framework which incorporates state-of-the-art BERT based MRC (Machine Reading Com- prehension) with IR components to enable end-to-end answer retrieval. Results from the demo system are shown to be high quality in both academic and industry domain specific settings. Finally, we discuss best practices when (pre-)training BERT based MRC models for production systems. Screencast links: - Short video (< 3 min): http: // - Supplementary long video (< 13 min):

Chameleon: A Language Model Adaptation Toolkit for Automatic Speech Recognition of Conversational Speech
Yuanfeng Song | Di Jiang | Weiwei Zhao | Qian Xu | Raymond Chi-Wing Wong | Qiang Yang

Language model is a vital component in modern automatic speech recognition (ASR) systems. Since “one-size-fits-all” language model works suboptimally for conversational speeches, language model adaptation (LMA) is considered as a promising solution for solving this problem. In order to compare the state-of-the-art LMA techniques and systematically demonstrate their effect in conversational speech recognition, we develop a novel toolkit named Chameleon, which includes the state-of-the-art cache-based and topic-based LMA techniques. This demonstration does not only vividly visualize underlying working mechanisms of a variety of the state-of-the-art LMA models but also provide an interface for the user to customize the hyperparameters of them. With this demonstration, the audience can experience the effect of LMA in an interactive and real-time fashion. We wish this demonstration would inspire more research on better language model techniques for ASR.

Controlling Sequence-to-Sequence Models - A Demonstration on Neural-based Acrostic Generator
Liang-Hsin Shen | Pei-Lun Tai | Chao-Chung Wu | Shou-De Lin

An acrostic is a form of writing that the first token of each line (or other recurring features in the text) forms a meaningful sequence. In this paper we present a generalized acrostic generation system that can hide certain message in a flexible pattern specified by the users. Different from previous works that focus on rule-based solutions, here we adopt a neural- based sequence-to-sequence model to achieve this goal. Besides acrostic, users are also allowed to specify the rhyme and length of the output sequences. Based on our knowledge, this is the first neural-based natural language generation system that demonstrates the capability of performing micro-level control over output sentences.

EASSE: Easier Automatic Sentence Simplification Evaluation
Fernando Alva-Manchego | Louis Martin | Carolina Scarton | Lucia Specia

We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems. EASSE provides a single access point to a broad range of evaluation resources: standard automatic metrics for assessing SS outputs (e.g. SARI), word-level accuracy scores for certain simplification transformations, reference-independent quality estimation features (e.g. compression ratio), and standard test data for SS evaluation (e.g. TurkCorpus). Finally, EASSE generates easy-to-visualise reports on the various metrics and features above and on how a particular SS output fares against reference simplifications. Through experiments, we show that these functionalities allow for better comparison and understanding of the performance of SS systems.

EGG: a toolkit for research on Emergence of lanGuage in Games
Eugene Kharitonov | Rahma Chaabouni | Diane Bouchacourt | Marco Baroni

There is renewed interest in simulating language emergence among deep neural agents that communicate to jointly solve a task, spurred by the practical aim to develop language-enabled interactive AIs, as well as by theoretical questions about the evolution of human language. However, optimizing deep architectures connected by a discrete communication channel (such as that in which language emerges) is technically challenging. We introduce EGG, a toolkit that greatly simplifies the implementation of emergent-language communication games. EGG’s modular design provides a set of building blocks that the user can combine to create new games, easily navigating the optimization and architecture space. We hope that the tool will lower the technical barrier, and encourage researchers from various backgrounds to do original work in this exciting area.

Entity resolution for noisy ASR transcripts
Arushi Raghuvanshi | Vijay Ramakrishnan | Varsha Embar | Lucien Carroll | Karthik Raghunathan

Large vocabulary domain-agnostic Automatic Speech Recognition (ASR) systems often mistranscribe domain-specific words and phrases. Since these generic ASR systems are the first component of most voice assistants in production, building Natural Language Understanding (NLU) systems that are robust to these errors can be a challenging task. In this paper, we focus on handling ASR errors in named entities, specifically person names, for a voice-based collaboration assistant. We demonstrate an effective method for resolving person names that are mistranscribed by black-box ASR systems, using character and phoneme-based information retrieval techniques and contextual information, which improves accuracy by 40.8% on our production system. We provide a live interactive demo to further illustrate the nuances of this problem and the effectiveness of our solution.

EUSP: An Easy-to-Use Semantic Parsing PlatForm
Bo An | Chen Bo | Xianpei Han | Le Sun

Semantic parsing aims to map natural language utterances into structured meaning representations. We present a modular platform, EUSP (Easy-to-Use Semantic Parsing PlatForm), that facilitates developers to build semantic parser from scratch. Instead of requiring a large amount of training data or complex grammar knowledge, in our platform developers can build grammar-based semantic parser or neural-based semantic parser through configure files which specify the modules and components that compose semantic parsing system. A high quality grammar-based semantic parsing system only requires domain lexicons rather than costly training data for a semantic parser. Furthermore, we provide a browser-based method to generate the semantic parsing system to minimize the difficulty of development. Experimental results show that the neural-based semantic parser system achieves competitive performance on semantic parsing task, and grammar-based semantic parsers significantly improve the performance of a business search engine.

FAMULUS: Interactive Annotation and Feedback Generation for Teaching Diagnostic Reasoning
Jonas Pfeiffer | Christian M. Meyer | Claudia Schulz | Jan Kiesewetter | Jan Zottmann | Michael Sailer | Elisabeth Bauer | Frank Fischer | Martin R. Fischer | Iryna Gurevych

Our proposed system FAMULUS helps students learn to diagnose based on automatic feedback in virtual patient simulations, and it supports instructors in labeling training data. Diagnosing is an exceptionally difficult skill to obtain but vital for many different professions (e.g., medical doctors, teachers). Previous case simulation systems are limited to multiple-choice questions and thus cannot give constructive individualized feedback on a student’s diagnostic reasoning process. Given initially only limited data, we leverage a (replaceable) NLP model to both support experts in their further data annotation with automatic suggestions, and we provide automatic feedback for students. We argue that because the central model consistently improves, our interactive approach encourages both students and instructors to recurrently use the tool, and thus accelerate the speed of data creation and annotation. We show results from two user studies on diagnostic reasoning in medicine and teacher education and outline how our system can be extended to further use cases.

Gunrock: A Social Bot for Complex and Engaging Long Conversations
Dian Yu | Michelle Cohn | Yi Mang Yang | Chun Yen Chen | Weiming Wen | Jiaping Zhang | Mingyang Zhou | Kevin Jesse | Austin Chau | Antara Bhowmick | Shreenath Iyer | Giritheja Sreenivasulu | Sam Davidson | Ashwin Bhandare | Zhou Yu

Gunrock is the winner of the 2018 Amazon Alexa Prize, as evaluated by coherence and engagement from both real users and Amazon-selected expert conversationalists. We focus on understanding complex sentences and having in-depth conversations in open domains. In this paper, we introduce some innovative system designs and related validation analysis. Overall, we found that users produce longer sentences to Gunrock, which are directly related to users’ engagement (e.g., ratings, number of turns). Additionally, users’ backstory queries about Gunrock are positively correlated to user satisfaction. Finally, we found dialog flows that interleave facts and personal opinions and stories lead to better user satisfaction.

HARE: a Flexible Highlighting Annotator for Ranking and Exploration
Denis Newman-Griffis | Eric Fosler-Lussier

Exploration and analysis of potential data sources is a significant challenge in the application of NLP techniques to novel information domains. We describe HARE, a system for highlighting relevant information in document collections to support ranking and triage, which provides tools for post-processing and qualitative analysis for model development and tuning. We apply HARE to the use case of narrative descriptions of mobility information in clinical data, and demonstrate its utility in comparing candidate embedding features. We provide a web-based interface for annotation visualization and document ranking, with a modular backend to support interoperability with existing annotation tools. Our system is available online at

Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting
Jaejun Lee | Raphael Tang | Jimmy Lin

Used for simple commands recognition on devices from smart speakers to mobile phones, keyword spotting systems are everywhere. Ubiquitous as well are web applications, which have grown in popularity and complexity over the last decade. However, despite their obvious advantages in natural language interaction, voice-enabled web applications are still few and far between. We attempt to bridge this gap with Honkling, a novel, JavaScript-based keyword spotting system. Purely client-side and cross-device compatible, Honkling can be deployed directly on user devices. Our in-browser implementation enables seamless personalization, which can greatly improve model quality; in the presence of underrepresented, non-American user accents, we can achieve up to an absolute 10% increase in accuracy in the personalized model with only a few examples.

IFlyLegal: A Chinese Legal System for Consultation, Law Searching, and Document Analysis
Ziyue Wang | Baoxin Wang | Xingyi Duan | Dayong Wu | Shijin Wang | Guoping Hu | Ting Liu

Legal Tech is developed to help people with legal services and solve legal problems via machines. To achieve this, one of the key requirements for machines is to utilize legal knowledge and comprehend legal context. This can be fulfilled by natural language processing (NLP) techniques, for instance, text representation, text categorization, question answering (QA) and natural language inference, etc. To this end, we introduce a freely available Chinese Legal Tech system (IFlyLegal) that benefits from multiple NLP tasks. It is an integrated system that performs legal consulting, multi-way law searching, and legal document analysis by exploiting techniques such as deep contextual representations and various attention mechanisms. To our knowledge, IFlyLegal is the first Chinese legal system that employs up-to-date NLP techniques and caters for needs of different user groups, such as lawyers, judges, procurators, and clients. Since Jan, 2019, we have gathered 2,349 users and 28,238 page views (till June, 23, 2019).

INMT: Interactive Neural Machine Translation Prediction
Sebastin Santy | Sandipan Dandapat | Monojit Choudhury | Kalika Bali

In this paper, we demonstrate an Interactive Machine Translation interface, that assists human translators with on-the-fly hints and suggestions. This makes the end-to-end translation process faster, more efficient and creates high-quality translations. We augment the OpenNMT backend with a mechanism to accept the user input and generate conditioned translations.

Joey NMT: A Minimalist NMT Toolkit for Novices
Julia Kreutzer | Jasmijn Bastings | Stefan Riezler

We present Joey NMT, a minimalist neural machine translation toolkit based on PyTorch that is specifically designed for novices. Joey NMT provides many popular NMT features in a small and simple code base, so that novices can easily and quickly learn to use it and adapt it to their needs. Despite its focus on simplicity, Joey NMT supports classic architectures (RNNs, transformers), fast beam search, weight tying, and more, and achieves performance comparable to more complex toolkits on standard benchmarks. We evaluate the accessibility of our toolkit in a user study where novices with general knowledge about Pytorch and NMT and experts work through a self-contained Joey NMT tutorial, showing that novices perform almost as well as experts in a subsequent code quiz. Joey NMT is available at

Journalist-in-the-Loop: Continuous Learning as a Service for Rumour Analysis
Twin Karmakharm | Nikolaos Aletras | Kalina Bontcheva

Automatically identifying rumours in social media and assessing their veracity is an important task with downstream applications in journalism. A significant challenge is how to keep rumour analysis tools up-to-date as new information becomes available for particular rumours that spread in a social network. This paper presents a novel open-source web-based rumour analysis tool that can continuous learn from journalists. The system features a rumour annotation service that allows journalists to easily provide feedback for a given social media post through a web-based interface. The feedback allows the system to improve an underlying state-of-the-art neural network-based rumour classification model. The system can be easily integrated as a service into existing tools and platforms used by journalists using a REST API.

LIDA: Lightweight Interactive Dialogue Annotator
Edward Collins | Nikolai Rozanov | Bingbing Zhang

Dialogue systems have the potential to change how people interact with machines but are highly dependent on the quality of the data used to train them.It is therefore important to develop good dialogue annotation tools which can improve the speed and quality of dialogue data annotation. With this in mind, we introduce LIDA, an annotation tool designed specifically for conversation data. As far as we know, LIDA is the first dialogue annotation system that handles the entire dialogue annotation pipeline from raw text, as may be the output of transcription services, to structured conversation data. Furthermore it supports the integration of arbitrary machine learning mod-els as annotation recommenders and also has a dedicated interface to resolve inter-annotator disagreements such as after crowdsourcing an-notations for a dataset. LIDA is fully open source, documented and publicly available.[] –> Screen Cast:

LINSPECTOR WEB: A Multilingual Probing Suite for Word Representations
Max Eichler | Gözde Gül Şahin | Iryna Gurevych

We present LINSPECTOR WEB , an open source multilingual inspector to analyze word representations. Our system provides researchers working in low-resource settings with an easily accessible web based probing tool to gain quick insights into their word embeddings especially outside of the English language. To do this we employ 16 simple linguistic probing tasks such as gender, case marking, and tense for a diverse set of 28 languages. We support probing of static word embeddings along with pretrained AllenNLP models that are commonly used for NLP downstream tasks such as named entity recognition, natural language inference and dependency parsing. The results are visualized in a polar chart and also provided as a table. LINSPECTOR WEB is available as an offline tool or at

MAssistant: A Personal Knowledge Assistant for MOOC Learners
Lan Jiang | Shuhan Hu | Mingyu Huang | Zhichun Wang | Jinjian Yang | Xiaoju Ye | Wei Zheng

Massive Open Online Courses (MOOCs) have developed rapidly and attracted large number of learners. In this work, we present MAssistant system, a personal knowledge assistant for MOOC learners. MAssistant helps users to trace the concepts they have learned in MOOCs, and to build their own concept graphs. There are three key components in MAssistant: (i) a large-scale concept graph built from open data sources, which contains concepts in various domains and relations among them; (ii) a browser extension which interacts with learners when they are watching video lectures, and presents important concepts to them; (iii) a web application allowing users to explore their personal concept graphs, which are built based on their learning activities on MOOCs. MAssistant will facilitate the knowledge management task for MOOC learners, and make the learning on MOOCs easier.

MedCATTrainer: A Biomedical Free Text Annotation Interface with Active Learning and Research Use Case Specific Customisation
Thomas Searle | Zeljko Kraljevic | Rebecca Bendayan | Daniel Bean | Richard Dobson

An interface for building, improving and customising a given Named Entity Recognition and Linking (NER+L) model for biomedical domain text, and the efficient collation of accurate research use case specific training data and subsequent model training. Screencast demo available here:

Memory Grounded Conversational Reasoning
Seungwhan Moon | Pararth Shah | Rajen Subba | Anuj Kumar

We demonstrate a conversational system which engages the user through a multi-modal, multi-turn dialog over the user’s memories. The system can perform QA over memories by responding to user queries to recall specific attributes and associated media (e.g. photos) of past episodic memories. The system can also make proactive suggestions to surface related events or facts from past memories to make conversations more engaging and natural. To implement such a system, we collect a new corpus of memory grounded conversations, which comprises human-to-human role-playing dialogs given synthetic memory graphs with simulated attributes. Our proof-of-concept system operates on these synthetic memory graphs, however it can be trained and applied to real-world user memory data (e.g. photo albums, etc.) We present the architecture of the proposed conversational system, and example queries that the system supports.

Multilingual, Multi-scale and Multi-layer Visualization of Intermediate Representations
Carlos Escolano | Marta R. Costa-jussà | Elora Lacroux | Pere-Pau Vázquez

The main alternatives nowadays to deal with sequences are Recurrent Neural Networks (RNN) architectures and the Transformer. In this context, Both RNN’s and Transformer have been used as an encoder-decoder architecture with multiple layers in each module. Far beyond this, these architectures are the basis for the contextual word embeddings which are revolutionizing most natural language downstream applications. However, intermediate representations in either the RNN or Transformer architectures can be difficult to interpret. To make these layer representations more accessible and meaningful, we introduce a web-based tool that visualizes them both at the sentence and token level. We present three use cases. The first analyses gender issues in contextual word embeddings. The second and third are showing multilingual intermediate representations for sentences and tokens and the evolution of these intermediate representations along with the multiple layers of the decoder and in the context of multilingual machine translation.

MY-AKKHARA: A Romanization-based Burmese (Myanmar) Input Method
Chenchen Ding | Masao Utiyama | Eiichiro Sumita

MY-AKKHARA is a method used to input Burmese texts encoded in the Unicode standard, based on commonly accepted Latin transcription. By using this method, arbitrary Burmese strings can be accurately inputted with 26 lowercase Latin letters. Meanwhile, the 26 uppercase Latin letters are designed as shortcuts of lowercase letter sequences. The frequency of Burmese characters is considered in MY-AKKHARA to realize an efficient keystroke distribution on a QWERTY keyboard. Given that the Unicode standard has not been extensively used in digitization of Burmese, we hope that MY-AKKHARA can contribute to the widespread use of Unicode in Myanmar and can provide a platform for smart input methods for Burmese in the future. An implementation of MY-AKKHARA running in Windows is released at

NeuronBlocks: Building Your NLP DNN Models Like Playing Lego
Ming Gong | Linjun Shou | Wutao Lin | Zhijie Sang | Quanjia Yan | Ze Yang | Feixiang Cheng | Daxin Jiang

Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks. However, many engineers find it a big overhead when they have to choose from multiple frameworks, compare different types of models, and understand various optimization mechanisms. An NLP toolkit for DNN models with both generality and flexibility can greatly improve the productivity of engineers by saving their learning cost and guiding them to find optimal solutions to their tasks. In this paper, we introduce NeuronBlocks, a toolkit encapsulating a suite of neural network modules as building blocks to construct various DNN models with complex architecture. This toolkit empowers engineers to build, train, and test various NLP models through simple configuration of JSON files. The experiments on several NLP datasets such as GLUE, WikiQA and CoNLL-2003 demonstrate the effectiveness of NeuronBlocks. Code: Demo:

OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction
Xu Han | Tianyu Gao | Yuan Yao | Deming Ye | Zhiyuan Liu | Maosong Sun

OpenNRE is an open-source and extensible toolkit that provides a unified framework to implement neural models for relation extraction (RE). Specifically, by implementing typical RE methods, OpenNRE not only allows developers to train custom models to extract structured relational facts from the plain text but also supports quick model validation for researchers. Besides, OpenNRE provides various functional RE modules based on both TensorFlow and PyTorch to maintain sufficient modularity and extensibility, making it becomes easy to incorporate new models into the framework. Besides the toolkit, we also release an online system to meet real-time extraction without any training and deploying. Meanwhile, the online system can extract facts in various scenarios as well as aligning the extracted facts to Wikidata, which may benefit various downstream knowledge-driven applications (e.g., information retrieval and question answering). More details of the toolkit and online system can be obtained from

ParaQG: A System for Generating Questions and Answers from Paragraphs
Vishwajeet Kumar | Sivaanandh Muneeswaran | Ganesh Ramakrishnan | Yuan-Fang Li

Generating syntactically and semantically valid and relevant questions from paragraphs is useful with many applications. Manual generation is a labour-intensive task, as it requires the reading, parsing and understanding of long passages of text. A number of question generation models based on sequence-to-sequence techniques have recently been proposed. Most of them generate questions from sentences only, and none of them is publicly available as an easy-to-use service. In this paper, we demonstrate ParaQG, a Web-based system for generating questions from sentences and paragraphs. ParaQG incorporates a number of novel functionalities to make the question generation process user-friendly. It provides an interactive interface for a user to select answers with visual insights on generation of questions. It also employs various faceted views to group similar questions as well as filtering techniques to eliminate unanswerable questions.

PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking
Matthew Henderson | Ivan Vulić | Iñigo Casanueva | Paweł Budzianowski | Daniela Gerz | Sam Coope | Georgios Spithourakis | Tsung-Hsien Wen | Nikola Mrkšić | Pei-Hao Su

We present PolyResponse, a conversational search engine that supports task-oriented dialogue. It is a retrieval-based approach that bypasses the complex multi-component design of traditional task-oriented dialogue systems and the use of explicit semantics in the form of task-specific ontologies. The PolyResponse engine is trained on hundreds of millions of examples extracted from real conversations: it learns what responses are appropriate in different conversational contexts. It then ranks a large index of text and visual responses according to their similarity to the given context, and narrows down the list of relevant entities during the multi-turn conversation. We introduce a restaurant search and booking system powered by the PolyResponse engine, currently available in 8 different languages.

PyOpenDial: A Python-based Domain-Independent Toolkit for Developing Spoken Dialogue Systems with Probabilistic Rules
Youngsoo Jang | Jongmin Lee | Jaeyoung Park | Kyeng-Hun Lee | Pierre Lison | Kee-Eung Kim

We present PyOpenDial, a Python-based domain-independent, open-source toolkit for spoken dialogue systems. Recent advances in core components of dialogue systems, such as speech recognition, language understanding, dialogue management, and language generation, harness deep learning to achieve state-of-the-art performance. The original OpenDial, implemented in Java, provides a plugin architecture to integrate external modules, but lacks Python bindings, making it difficult to interface with popular deep learning frameworks such as Tensorflow or PyTorch. To this end, we re-implemented OpenDial in Python and extended the toolkit with a number of novel functionalities for neural dialogue state tracking and action planning. We describe the overall architecture and its extensions, and illustrate their use on an example where the system response model is implemented with a recurrent neural network.

Redcoat: A Collaborative Annotation Tool for Hierarchical Entity Typing
Michael Stewart | Wei Liu | Rachel Cardell-Oliver

We introduce Redcoat, a web-based annotation tool that supports collaborative hierarchical entity typing. As an annotation tool, Redcoat also facilitates knowledge elicitation by allowing the creation and continuous refinement of concept hierarchies during annotation. It aims to minimise not only annotation time but the time it takes for project creators to set up and distribute projects to annotators. Projects created using the web-based interface can be rapidly distributed to a list of email addresses. Redcoat handles the propagation of documents amongst annotators and automatically scales the annotation workload depending on the number of active annotators. In this paper we discuss these key features and outline Redcoat’s system architecture. We also highlight Redcoat’s unique benefits over existing annotation tools via a qualitative comparison.

SEAGLE: A Platform for Comparative Evaluation of Semantic Encoders for Information Retrieval
Fabian David Schmidt | Markus Dietsche | Simone Paolo Ponzetto | Goran Glavaš

We introduce Seagle, a platform for comparative evaluation of semantic text encoding models on information retrieval (IR) tasks. Seagle implements (1) word embedding aggregators, which represent texts as algebraic aggregations of pretrained word embeddings and (2) pretrained semantic encoders, and allows for their comparative evaluation on arbitrary (monolingual and cross-lingual) IR collections. We benchmark Seagle’s models on monolingual document retrieval and cross-lingual sentence retrieval. Seagle functionality can be exploited via an easy-to-use web interface and its modular backend (micro-service architecture) can easily be extended with additional semantic search models.

A Stylometry Toolkit for Latin Literature
Thomas J. Bolt | Jeffrey H. Flynt | Pramit Chaudhuri | Joseph P. Dexter

Computational stylometry has become an increasingly important aspect of literary criticism, but many humanists lack the technical expertise or language-specific NLP resources required to exploit computational methods. We demonstrate a stylometry toolkit for analysis of Latin literary texts, which is freely available at Our toolkit generates data for a diverse range of literary features and has an intuitive point-and-click interface. The features included have proven effective for multiple literary studies and are calculated using custom heuristics without the need for syntactic parsing. As such, the toolkit models one approach to the user-friendly generation of stylometric data, which could be extended to other premodern and non-English languages underserved by standard NLP resources.

A Summarization System for Scientific Documents
Shai Erera | Michal Shmueli-Scheuer | Guy Feigenblat | Ora Peled Nakash | Odellia Boni | Haggai Roitman | Doron Cohen | Bar Weiner | Yosi Mass | Or Rivlin | Guy Lev | Achiya Jerbi | Jonathan Herzig | Yufang Hou | Charles Jochim | Martin Gleize | Francesca Bonin | Francesca Bonin | David Konopnicki

We present a novel system providing summaries for Computer Science publications. Through a qualitative user study, we identified the most valuable scenarios for discovery, exploration and understanding of scientific documents. Based on these findings, we built a system that retrieves and summarizes scientific documents for a given information need, either in form of a free-text query or by choosing categorized values such as scientific tasks, datasets and more. Our system ingested 270,000 papers, and its summarization module aims to generate concise yet detailed summaries. We validated our approach with human experts.

A System for Diacritizing Four Varieties of Arabic
Hamdy Mubarak | Ahmed Abdelali | Kareem Darwish | Mohamed Eldesouki | Younes Samih | Hassan Sajjad

Short vowels, aka diacritics, are more often omitted when writing different varieties of Arabic including Modern Standard Arabic (MSA), Classical Arabic (CA), and Dialectal Arabic (DA). However, diacritics are required to properly pronounce words, which makes diacritic restoration (a.k.a. diacritization) essential for language learning and text-to-speech applications. In this paper, we present a system for diacritizing MSA, CA, and two varieties of DA, namely Moroccan and Tunisian. The system uses a character level sequence-to-sequence deep learning model that requires no feature engineering and beats all previous SOTA systems for all the Arabic varieties that we test on.

Tanbih: Get To Know What You Are Reading
Yifan Zhang | Giovanni Da San Martino | Alberto Barrón-Cedeño | Salvatore Romeo | Jisun An | Haewoon Kwak | Todor Staykovski | Israa Jaradat | Georgi Karadzhov | Ramy Baly | Kareem Darwish | James Glass | Preslav Nakov

We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what’s behind a news story. Our system displays news grouped into events and generates media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, and stance with respect to various claims and topics of a news outlet. In addition, we automatically analyse each article to detect whether it is propagandistic and to determine its stance with respect to a number of controversial topics.

TEASPN: Framework and Protocol for Integrated Writing Assistance Environments
Masato Hagiwara | Takumi Ito | Tatsuki Kuribayashi | Jun Suzuki | Kentaro Inui

Language technologies play a key role in assisting people with their writing. Although there has been steady progress in e.g., grammatical error correction (GEC), human writers are yet to benefit from this progress due to the high development cost of integrating with writing software. We propose TEASPN, a protocol and an open-source framework for achieving integrated writing assistance environments. The protocol standardizes the way writing software communicates with servers that implement such technologies, allowing developers and researchers to integrate the latest developments in natural language processing (NLP) with low cost. As a result, users can enjoy the integrated experience in their favorite writing software. The results from experiments with human participants show that users use a wide range of technologies and rate their writing experience favorably, allowing them to write more fluent text.

TellMeWhy: Learning to Explain Corrective Feedback for Second Language Learners
Yi-Huei Lai | Jason Chang

We present a writing prototype feedback system, TellMeWhy, to provide explanations of errors in submitted essays. In our approach, the sentence with corrections is analyzed to identify error types and problem words, aimed at customizing explanations based on the context of the error. The method involves learning the relation of errors and problem words, generating common feedback patterns, and extracting grammar patterns, collocations and example sentences. At run-time, a sentence with corrections is classified, and the problem word and template are identified to provide detailed explanations. Preliminary evaluation shows that the method has potential to improve existing commercial writing services.

UER: An Open-Source Toolkit for Pre-training Models
Zhe Zhao | Hui Chen | Jinbin Zhang | Xin Zhao | Tao Liu | Wei Lu | Xi Chen | Haotang Deng | Qi Ju | Xiaoyong Du

Existing works, including ELMO and BERT, have revealed the importance of pre-training for NLP tasks. While there does not exist a single pre-training model that works best in all cases, it is of necessity to develop a framework that is able to deploy various pre-training models efficiently. For this purpose, we propose an assemble-on-demand pre-training toolkit, namely Universal Encoder Representations (UER). UER is loosely coupled, and encapsulated with rich modules. By assembling modules on demand, users can either reproduce a state-of-the-art pre-training model or develop a pre-training model that remains unexplored. With UER, we have built a model zoo, which contains pre-trained models based on different corpora, encoders, and targets (objectives). With proper pre-trained models, we could achieve new state-of-the-art results on a range of downstream datasets.

Visualizing Trends of Key Roles in News Articles
Chen Xia | Haoxiang Zhang | Jacob Moghtader | Allen Wu | Kai-Wei Chang

There are tons of news generated every day reflecting the change of key roles such as people, organizations and political parties. Analyzing the trend of these key roles can help understand the information flow in a more effective way. In this paper, we present a demonstration system that visualizes the news trend of key roles based on natural language processing techniques. Specifically, we apply semantic role labelling to understand relationships between key roles in the news. We also train a dynamic word embedding model to align representations of words in different time periods to measure how the similarities between a key role and news topics change over time. Note: The github link to our demo jupyter notebook and screencast video is

VizSeq: a visual analysis toolkit for text generation tasks
Changhan Wang | Anirudh Jain | Danlu Chen | Jiatao Gu

Automatic evaluation of text generation tasks (e.g. machine translation, text summarization, image captioning and video description) usually relies heavily on task-specific metrics, such as BLEU and ROUGE. They, however, are abstract numbers and are not perfectly aligned with human assessment. This suggests inspecting detailed examples as a complement to identify system error patterns. In this paper, we present VizSeq, a visual analysis toolkit for instance-level and corpus-level system evaluation on a wide variety of text generation tasks. It supports multimodal sources and multiple text references, providing visualization in Jupyter notebook or a web app interface. It can be used locally or deployed onto public servers for centralized data hosting and benchmarking. It covers most common n-gram based metrics accelerated with multiprocessing, and also provides latest embedding-based metrics such as BERTScore.

What’s Wrong with Hebrew NLP? And How to Make it Right
Reut Tsarfaty | Shoval Sadde | Stav Klein | Amit Seker

For languages with simple morphology such as English, automatic annotation pipelines such as spaCy or Stanford’s CoreNLP successfully serve projects in academia and the industry. For many morphologically-rich languages (MRLs), similar pipelines show sub-optimal performance that limits their applicability for text analysis in research and the industry. The sub-optimal performance is mainly due to errors in early morphological disambiguation decisions, that cannot be recovered later on in the pipeline, yielding incoherent annotations on the whole. This paper describes the design and use of the ONLP suite, a joint morpho-syntactic infrastructure for processing Modern Hebrew texts. The joint inference over morphology and syntax substantially limits error propagation, and leads to high accuracy. ONLP provides rich and expressive annotations which already serve diverse academic and commercial needs. Its accompanying demo further serves educational activities, introducing Hebrew NLP intricacies to researchers and non-researchers alike.