Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

Dongyan Zhao (Editor)

Anthology ID:
Santa Fe, New Mexico
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations
Dongyan Zhao

pdf bib
Abbreviation Expander - a Web-based System for Easy Reading of Technical Documents
Manuel R. Ciosici | Ira Assent

Abbreviations and acronyms are a part of textual communication in most domains. However, abbreviations are not necessarily defined in documents that employ them. Understanding all abbreviations used in a given document often requires extensive knowledge of the target domain and the ability to disambiguate based on context. This creates considerable entry barriers to newcomers and difficulties in automated document processing. Existing abbreviation expansion systems or tools require substantial technical knowledge for set up or make strong assumptions which limit their use in practice. Here, we present Abbreviation Expander, a system that builds on state of the art methods for identification of abbreviations, acronyms and their definitions and a novel disambiguator for abbreviation expansion in an easily accessible web-based solution.

pdf bib
The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation
Jan-Christoph Klie | Michael Bugert | Beto Boullosa | Richard Eckart de Castilho | Iryna Gurevych

We introduce INCEpTION, a new annotation platform for tasks including interactive and semantic annotation (e.g., concept linking, fact linking, knowledge base population, semantic frame annotation). These tasks are very time consuming and demanding for annotators, especially when knowledge bases are used. We address these issues by developing an annotation platform that incorporates machine learning capabilities which actively assist and guide annotators. The platform is both generic and modular. It targets a range of research domains in need of semantic annotation, such as digital humanities, bioinformatics, or linguistics. INCEpTION is publicly available as open-source software.

JeSemE: Interleaving Semantics and Emotions in a Web Service for the Exploration of Language Change Phenomena
Johannes Hellrich | Sven Buechel | Udo Hahn

We here introduce a substantially extended version of JeSemE, an interactive website for visually exploring computationally derived time-variant information on word meanings and lexical emotions assembled from five large diachronic text corpora. JeSemE is designed for scholars in the (digital) humanities as an alternative to consulting manually compiled, printed dictionaries for such information (if available at all). This tool uniquely combines state-of-the-art distributional semantics with a nuanced model of human emotions, two information streams we deem beneficial for a data-driven interpretation of texts in the humanities.

T-Know: a Knowledge Graph-based Question Answering and Infor-mation Retrieval System for Traditional Chinese Medicine
Ziqing Liu | Enwei Peng | Shixing Yan | Guozheng Li | Tianyong Hao

T-Know is a knowledge service system based on the constructed knowledge graph of Traditional Chinese Medicine (TCM). Using authorized and anonymized clinical records, medicine clinical guidelines, teaching materials, classic medical books, academic publications, etc., as data resources, the system extracts triples from free texts to build a TCM knowledge graph by our developed natural language processing methods. On the basis of the knowledge graph, a deep learning algorithm is implemented for single-round question understanding and multiple-round dialogue. In addition, the TCM knowledge graph also is used to support human-computer interactive knowledge retrieval by normalizing search keywords to medical terminology.

A Korean Knowledge Extraction System for Enriching a KBox
Sangha Nam | Eun-kyung Kim | Jiho Kim | Yoosung Jung | Kijong Han | Key-Sun Choi

The increased demand for structured knowledge has created considerable interest in knowledge extraction from natural language sentences. This study presents a new Korean knowledge extraction system and web interface for enriching a KBox knowledge base that expands based on the Korean DBpedia. The aim is to create an endpoint where knowledge can be extracted and added to KBox anytime and anywhere.

Real-time Scholarly Retweeting Prediction System
Zhunchen Luo | Xiao Liu

Twitter has become one of the most import channels to spread latest scholarly information because of its fast information spread speed. How to predict whether a scholarly tweet will be retweeted is a key task in understanding the message propagation within large user communities. Hence, we present the real-time scholarly retweeting prediction system that retrieves scholarly tweets which will be retweeted. First, we filter scholarly tweets from tracking a tweet stream. Then, we extract Tweet Scholar Blocks indicating metadata of papers. At last, we combine scholarly features with the Tweet Scholar Blocks to predict whether a scholarly tweet will be retweeted. Our system outperforms chosen baseline systems. Additionally, our system has the potential to predict scientific impact in real-time.

Document Representation Learning for Patient History Visualization
Halid Ziya Yerebakan | Yoshihisa Shinagawa | Parmeet Bhatia | Yiqiang Zhan

We tackle the problem of generating a diagrammatic summary of a set of documents each of which pertains to loosely related topics. In particular, we aim at visualizing the medical histories of patients. In medicine, choosing relevant reports from a patient’s past exams for comparison provide valuable information for precise treatment planning. Manually finding the relevant reports for comparison studies from a large database is time-consuming, which could result overlooking of some critical information. This task can be automated by defining similarity among documents which is a nontrivial task since these documents are often stored in an unstructured text format. To facilitate this, we have used a representation learning algorithm that creates a semantic representation space for documents where the clinically related documents lie close to each other. We have utilized referral information to weakly supervise a LSTM network to learn this semantic space. The abstract representations within this semantic space are not only useful to visualize disease progressions corresponding to the relevant report groups of a patient, but are also beneficial to analyze diseases at the population level. The proposed key tool here is clustering of documents based on the document similarity whose metric is learned from corpora.

HiDE: a Tool for Unrestricted Literature Based Discovery
Judita Preiss | Mark Stevenson

As the quantity of publications increases daily, researchers are forced to narrow their attention to their own specialism and are therefore less likely to make new connections with other areas. Literature based discovery (LBD) supports the identification of such connections. A number of LBD tools are available, however, they often suffer from limitations such as constraining possible searches or not producing results in real-time. We introduce HiDE (Hidden Discovery Explorer), an online knowledge browsing tool which allows fast access to hidden knowledge generated from all abstracts in Medline. HiDE is fast enough to allow users to explore the full range of hidden connections generated by an LBD system. The tool employs a novel combination of two approaches to LBD: a graph-based approach which allows hidden knowledge to be generated on a large scale and an inference algorithm to identify the most promising (most likely to be non trivial) information. Available at

Active DOP: A constituency treebank annotation tool with online learning
Andreas van Cranenburgh

We present a language-independent treebank annotation tool supporting rich annotations with discontinuous constituents and function tags. Candidate analyses are generated by an exemplar-based parsing model that immediately learns from each new annotated sentence during annotation. This makes it suitable for situations in which only a limited seed treebank is available, or a radically different domain is being annotated. The tool offers the possibility to experiment with and evaluate active learning methods to speed up annotation in a naturalistic setting, i.e., measuring actual annotation costs and tracking specific user interactions. The code is made available under the GNU GPL license at

CRST: a Claim Retrieval System in Twitter
Wenjia Ma | WenHan Chao | Zhunchen Luo | Xin Jiang

For controversial topics, collecting argumentation-containing tweets which tend to be more convincing will help researchers analyze public opinions. Meanwhile, claim is the heart of argumentation. Hence, we present the first real-time claim retrieval system CRST that retrieves tweets containing claims for a given topic from Twitter. We propose a claim-oriented ranking module which can be divided into the offline topic-independent learning to rank model and the online topic-dependent lexicon model. Our system outperforms previous claim retrieval system and argument mining system. Moreover, the claim-oriented ranking module can be easily adapted to new topics without any manual process or external information, guaranteeing the practicability of our system.

Utilizing Graph Measure to Deduce Omitted Entities in Paragraphs
Eun-kyung Kim | Kijong Han | Jiho Kim | Key-Sun Choi

This demo deals with the problem of capturing omitted arguments in relation extraction given a proper knowledge base for entities of interest. This paper introduces the concept of a salient entity and use this information to deduce omitted entities in the paragraph which allows improving the relation extraction quality. The main idea to compute salient entities is to construct a graph on the given information (by identifying the entities but without parsing it), rank it with standard graph measures and embed it in the context of the sentences.

Transparent, Efficient, and Robust Word Embedding Access with WOMBAT
Mark-Christoph Müller | Michael Strube

We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.

SetExpander: End-to-end Term Set Expansion Based on Multi-Context Term Embeddings
Jonathan Mamou | Oren Pereg | Moshe Wasserblat | Ido Dagan | Yoav Goldberg | Alon Eirew | Yael Green | Shira Guskin | Peter Izsak | Daniel Korat

We present SetExpander, a corpus-based system for expanding a seed set of terms into a more complete set of terms that belong to the same semantic class. SetExpander implements an iterative end-to end workflow for term set expansion. It enables users to easily select a seed set of terms, expand it, view the expanded set, validate it, re-expand the validated set and store it, thus simplifying the extraction of domain-specific fine-grained semantic classes. SetExpander has been used for solving real-life use cases including integration in an automated recruitment system and an issues and defects resolution system. A video demo of SetExpander is available at .

Detecting Heavy Rain Disaster from Social and Physical Sensor
Tomoya Iwakura | Seiji Okajima | Nobuyuki Igata | Kunihiro Takeda | Yuzuru Yamakage | Naoshi Morita

We present our system that assists to detect heavy rain disaster, which is being used in real world in Japan. Our system selects tweets about heavy rain disaster with a document classifier. Then, the locations mentioned in the selected tweets are estimated by a location estimator. Finally, combined the selected tweets with amount of rainfall given by physical sensors and a statistical analysis, our system provides users with visualized results for detecting heavy rain disaster.

Simulating Language Evolution: a Tool for Historical Linguistics
Alina Maria Ciobanu | Liviu P. Dinu

Language change across space and time is one of the main concerns in historical linguistics. In this paper, we develop a language evolution simulator: a web-based tool for word form production to assist in historical linguistics, in studying the evolution of the languages. Given a word in a source language, the system automatically predicts how the word evolves in a target language. The method that we propose is language-agnostic and does not use any external knowledge, except for the training word pairs.

A Unified RvNN Framework for End-to-End Chinese Discourse Parsing
Lin Chuan-An | Hen-Hsen Huang | Zi-Yuan Chen | Hsin-Hsi Chen

This paper demonstrates an end-to-end Chinese discourse parser. We propose a unified framework based on recursive neural network (RvNN) to jointly model the subtasks including elementary discourse unit (EDU) segmentation, tree structure construction, center labeling, and sense labeling. Experimental results show our parser achieves the state-of-the-art performance in the Chinese Discourse Treebank (CDTB) dataset. We release the source code with a pre-trained model for the NLP community. To the best of our knowledge, this is the first open source toolkit for Chinese discourse parsing. The standalone toolkit can be integrated into subsequent applications without the need of external resources such as syntactic parser.

A Web-based Framework for Collecting and Assessing Highlighted Sentences in a Document
Sasha Spala | Franck Dernoncourt | Walter Chang | Carl Dockhorn

Automatically highlighting a text aims at identifying key portions that are the most important to a reader. In this paper, we present a web-based framework designed to efficiently and scalably crowdsource two independent but related tasks: collecting highlight annotations, and comparing the performance of automated highlighting systems. The first task is necessary to understand human preferences and train supervised automated highlighting systems. The second task yields a more accurate and fine-grained evaluation than existing automated performance metrics.

Cool English: a Grammatical Error Correction System Based on Large Learner Corpora
Yu-Chun Lo | Jhih-Jie Chen | Chingyu Yang | Jason Chang

This paper presents a grammatical error correction (GEC) system that provides corrective feedback for essays. We apply the sequence-to-sequence model, which is frequently used in machine translation and text summarization, to this GEC task. The model is trained by EF-Cambridge Open Language Database (EFCAMDAT), a large learner corpus annotated with grammatical errors and corrections. Evaluation shows that our system achieves competitive performance on a number of publicly available testsets.

Appraise Evaluation Framework for Machine Translation
Christian Federmann

We present Appraise, an open-source framework for crowd-based annotation tasks, notably for evaluation of machine translation output. This is the software used to run the yearly evaluation campaigns for shared tasks at the WMT Conference on Machine Translation. It has also been used at IWSLT 2017 and, recently, to measure human parity for machine translation for Chinese to English news text. The demo will present the full end-to-end lifecycle of an Appraise evaluation campaign, from task creation to annotation and interpretation of results.

KIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning
Florian Dessloch | Thanh-Le Ha | Markus Müller | Jan Niehues | Thai-Son Nguyen | Ngoc-Quan Pham | Elizabeth Salesky | Matthias Sperber | Sebastian Stüker | Thomas Zenkel | Alexander Waibel

In today’s globalized world we have the ability to communicate with people across the world. However, in many situations the language barrier still presents a major issue. For example, many foreign students coming to KIT to study are initially unable to follow a lecture in German. Therefore, we offer an automatic simultaneous interpretation service for students. To fulfill this task, we have developed a low-latency translation system that is adapted to lectures and covers several language pairs. While the switch from traditional Statistical Machine Translation to Neural Machine Translation (NMT) significantly improved performance, to integrate NMT into the speech translation framework required several adjustments. We have addressed the run-time constraints and different types of input. Furthermore, we utilized one-shot learning to easily add new topic-specific terms to the system. Besides better performance, NMT also enabled us increase our covered languages through multilingual NMT. % Combining these techniques, we are able to provide an adapted speech translation system for several European languages.

Graphene: a Context-Preserving Open Information Extraction System
Matthias Cetto | Christina Niklaus | André Freitas | Siegfried Handschuh

We introduce Graphene, an Open IE system whose goal is to generate accurate, meaningful and complete propositions that may facilitate a variety of downstream semantic applications. For this purpose, we transform syntactically complex input sentences into clean, compact structures in the form of core facts and accompanying contexts, while identifying the rhetorical relations that hold between them in order to maintain their semantic relationship. In that way, we preserve the context of the relational tuples extracted from a source sentence, generating a novel lightweight semantic representation for Open IE that enhances the expressiveness of the extracted propositions.

LanguageNet: Learning to Find Sense Relevant Example Sentences
Shang-Chien Cheng | Jhih-Jie Chen | Chingyu Yang | Jason Chang

In this paper, we present a system, LanguageNet, which can help second language learners to search for different meanings and usages of a word. We disambiguate word senses based on the pairs of an English word and its corresponding Chinese translations in a parallel corpus, UM-Corpus. The process involved performing word alignment, learning vector space representations of words and training a classifier to distinguish words into groups of senses. LanguageNet directly shows the definition of a sense, bilingual synonyms and sense relevant examples.

Automatic Curation and Visualization of Crime Related Information from Incrementally Crawled Multi-source News Reports
Tirthankar Dasgupta | Lipika Dey | Rupsa Saha | Abir Naskar

In this paper, we demonstrate a system for the automatic extraction and curation of crime-related information from multi-source digitally published News articles collected over a period of five years. We have leveraged the use of deep convolution recurrent neural network model to analyze crime articles to extract different crime related entities and events. The proposed methods are not restricted to detecting known crimes only but contribute actively towards maintaining an updated crime ontology. We have done experiments with a collection of 5000 crime-reporting News articles span over time, and multiple sources. The end-product of our experiments is a crime-register that contains details of crime committed across geographies and time. This register can be further utilized for analytical and reporting purposes.

Lingke: a Fine-grained Multi-turn Chatbot for Customer Service
Pengfei Zhu | Zhuosheng Zhang | Jiangtong Li | Yafang Huang | Hai Zhao

Traditional chatbots usually need a mass of human dialogue data, especially when using supervised machine learning method. Though they can easily deal with single-turn question answering, for multi-turn the performance is usually unsatisfactory. In this paper, we present Lingke, an information retrieval augmented chatbot which is able to answer questions based on given product introduction document and deal with multi-turn conversations. We will introduce a fine-grained pipeline processing to distill responses based on unstructured documents, and attentive sequential context-response matching for multi-turn conversations.

Writing Mentor: Self-Regulated Writing Feedback for Struggling Writers
Nitin Madnani | Jill Burstein | Norbert Elliot | Beata Beigman Klebanov | Diane Napolitano | Slava Andreyev | Maxwell Schwartz

Writing Mentor is a free Google Docs add-on designed to provide feedback to struggling writers and help them improve their writing in a self-paced and self-regulated fashion. Writing Mentor uses natural language processing (NLP) methods and resources to generate feedback in terms of features that research into post-secondary struggling writers has classified as developmental (Burstein et al., 2016b). These features span many writing sub-constructs (use of sources, claims, and evidence; topic development; coherence; and knowledge of English conventions). Prelimi- nary analysis indicates that users have a largely positive impression of Writing Mentor in terms of usability and potential impact on their writing.

NLATool: an Application for Enhanced Deep Text Understanding
Markus Gärtner | Sven Mayer | Valentin Schwind | Eric Hämmerle | Emine Turcan | Florin Rheinwald | Gustav Murawski | Lars Lischke | Jonas Kuhn

Today, we see an ever growing number of tools supporting text annotation. Each of these tools is optimized for specific use-cases such as named entity recognition. However, we see large growing knowledge bases such as Wikipedia or the Google Knowledge Graph. In this paper, we introduce NLATool, a web application developed using a human-centered design process. The application combines supporting text annotation and enriching the text with additional information from a number of sources directly within the application. The tool assists users to efficiently recognize named entities, annotate text, and automatically provide users additional information while solving deep text understanding tasks.

Sensala: a Dynamic Semantics System for Natural Language Processing
Daniyar Itegulov | Ekaterina Lebedeva | Bruno Woltzenlogel Paleo

Here we describe Sensala , an open source framework for the semantic interpretation of natural language that provides the logical meaning of a given text. The framework’s theory is based on a lambda calculus with exception handling and uses contexts, continuations, events and dependent types to handle a wide range of complex linguistic phenomena, such as donkey anaphora, verb phrase anaphora, propositional anaphora, presuppositions and implicatures.

On-Device Neural Language Model Based Word Prediction
Seunghak Yu | Nilesh Kulkarni | Haejun Lee | Jihie Kim

Recent developments in deep learning with application to language modeling have led to success in tasks of text processing, summarizing and machine translation. However, deploying huge language models for the mobile device such as on-device keyboards poses computation as a bottle-neck due to their puny computation capacities. In this work, we propose an on-device neural language model based word prediction method that optimizes run-time memory and also provides a real-time prediction environment. Our model size is 7.40MB and has average prediction time of 6.47 ms. Our proposed model outperforms the existing methods for word prediction in terms of keystroke savings and word prediction rate and has been successfully commercialized.

WARP-Text: a Web-Based Tool for Annotating Relationships between Pairs of Texts
Venelin Kovatchev | M. Antònia Martí | Maria Salamó

We present WARP-Text, an open-source web-based tool for annotating relationships between pairs of texts. WARP-Text supports multi-layer annotation and custom definitions of inter-textual and intra-textual relationships. Annotation can be performed at different granularity levels (such as sentences, phrases, or tokens). WARP-Text has an intuitive user-friendly interface both for project managers and annotators. WARP-Text fills a gap in the currently available NLP toolbox, as open-source alternatives for annotation of pairs of text are not readily available. WARP-Text has already been used in several annotation tasks and can be of interest to the researchers working in the areas of Paraphrasing, Entailment, Simplification, and Summarization, among others.

A Chinese Writing Correction System for Learning Chinese as a Foreign Language
Yow-Ting Shiue | Hen-Hsen Huang | Hsin-Hsi Chen

We present a Chinese writing correction system for learning Chinese as a foreign language. The system takes a wrong input sentence and generates several correction suggestions. It also retrieves example Chinese sentences with English translations, helping users understand the correct usages of certain grammar patterns. This is the first available Chinese writing error correction system based on the neural machine translation framework. We discuss several design choices and show empirical results to support our decisions.

LTV: Labeled Topic Vector
Daniel Baumartz | Tolga Uslu | Alexander Mehler

In this paper we present LTV, a website and API that generates labeled topic classifications based on the Dewey Decimal Classification (DDC), an international standard for topic classification in libraries. We introduce nnDDC, a largely language-independent natural network-based classifier for DDC, which we optimized using a wide range of linguistic features to achieve an F-score of 87.4%. To show that our approach is language-independent, we evaluate nnDDC using up to 40 different languages. We derive a topic model based on nnDDC, which generates probability distributions over semantic units for any input on sense-, word- and text-level. Unlike related approaches, however, these probabilities are estimated by means of nnDDC so that each dimension of the resulting vector representation is uniquely labeled by a DDC class. In this way, we introduce a neural network-based Classifier-Induced Semantic Space (nnCISS).

Interpretable Rationale Augmented Charge Prediction System
Xin Jiang | Hai Ye | Zhunchen Luo | WenHan Chao | Wenjia Ma

This paper proposes a neural based system to solve the essential interpretability problem existing in text classification, especially in charge prediction task. First, we use a deep reinforcement learning method to extract rationales which mean short, readable and decisive snippets from input text. Then a rationale augmented classification model is proposed to elevate the prediction accuracy. Naturally, the extracted rationales serve as the introspection explanation for the prediction result of the model, enhancing the transparency of the model. Experimental results demonstrate that our system is able to extract readable rationales in a high consistency with manual annotation and is comparable with the attention model in prediction accuracy.

A Cross-lingual Messenger with Keyword Searchable Phrases for the Travel Domain
Shehroze Khan | Jihyun Kim | Tarik Zulfikarpasic | Peter Chen | Nizar Habash

We present Qutr (Query Translator), a smart cross-lingual communication application for the travel domain. Qutr is a real-time messaging app that automatically translates conversations while supporting keyword-to-sentence matching. Qutr relies on querying a database that holds commonly used pre-translated travel-domain phrases and phrase templates in different languages with the use of keywords. The query matching supports paraphrases, incomplete keywords and some input spelling errors. The application addresses common cross-lingual communication issues such as translation accuracy, speed, privacy, and personalization.

Towards Automated Extraction of Business Constraints from Unstructured Regulatory Text
Rahul Nair | Killian Levacher | Martin Stephenson

Large organizations spend considerable resources in reviewing regulations and ensuring that their business processes are compliant with the law. To make compliance workflows more efficient and responsive, we present a system for machine-driven annotations of legal documents. A set of natural language processing pipelines are designed and aimed at addressing some key questions in this domain: (a) is this (new) regulation relevant for me? (b) what set of requirements does this law impose?, and (c) what is the regulatory intent of a law? The system is currently undergoing user trials within our organization.

A Flexible and Easy-to-use Semantic Role Labeling Framework for Different Languages
Quynh Ngoc Thi Do | Artuur Leeuwenberg | Geert Heyman | Marie-Francine Moens

This paper presents a flexible and open source framework for deep semantic role labeling. We aim at facilitating easy exploration of model structures for multiple languages with different characteristics. It provides flexibility in its model construction in terms of word representation, sequence representation, output modeling, and inference styles and comes with clear output visualization. The framework is available under the Apache 2.0 license.