Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

Michal Ptaszynski, Bartosz Ziolko (Editors)

Anthology ID:
Barcelona, Spain (Online)
International Committee on Computational Linguistics (ICCL)
Bib Export formats:

pdf bib
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations
Michal Ptaszynski | Bartosz Ziolko

pdf bib
Ve’rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement
Khalid Alnajjar | Mika Hämäläinen | Jack Rueter | Niko Partanen

We present an open-source online dictionary editing system, Ve′rdd, that offers a chance to re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur editors. The idea is to incorporate community activities into a state-of-the-art finite-state language description of a seriously endangered minority language, Skolt Sami. Problems involve getting the community to take part in things above the pencil-and-paper level. At times, it seems that the native speakers and the dictionary oriented are lacking technical understanding to utilize the infrastructures which might make their work more meaningful in the future, i.e. multiple reuse of all of their input. Therefore, our system integrates with the existing tools and infrastructures for Uralic language masking the technical complexities behind a user-friendly UI.

pdf bib
MaintNet: A Collaborative Open-Source Library for Predictive Maintenance Language Resources
Farhad Akhbardeh | Travis Desell | Marcos Zampieri

Maintenance record logbooks are an emerging text type in NLP. An important part of them typically consist of free text with many domain specific technical terms, abbreviations, and non-standard spelling and grammar. This poses difficulties for NLP pipelines trained on standard corpora. Analyzing and annotating such documents is of particular importance in the development of predictive maintenance systems, which aim to improve operational efficiency, reduce costs, prevent accidents, and save lives. In order to facilitate and encourage research in this area, we have developed MaintNet, a collaborative open-source library of technical and domain-specific language resources. MaintNet provides novel logbook data from the aviation, automotive, and facility maintenance domains along with tools to aid in their (pre-)processing and clustering. Furthermore, it provides a way to encourage discussion on and sharing of new datasets and tools for logbook data analysis.

DART: A Lightweight Quality-Suggestive Data-to-Text Annotation Tool
Ernie Chang | Jeriah Caplinger | Alex Marin | Xiaoyu Shen | Vera Demberg

We present a lightweight annotation tool, the Data AnnotatoR Tool (DART), for the general task of labeling structured data with textual descriptions. The tool is implemented as an interactive application that reduces human efforts in annotating large quantities of structured data, e.g. in the format of a table or tree structure. By using a backend sequence-to-sequence model, our system iteratively analyzes the annotated labels in order to better sample unlabeled data. In a simulation experiment performed on annotating large quantities of structured data, DART has been shown to reduce the total number of annotations needed with active learning and automatically suggesting relevant labels.

Demo Application for the AutoGOAL Framework
Suilan Estevez-Velarde | Alejandro Piad-Morffis | Yoan Gutiérrez | Andres Montoyo | Rafael Muñoz-Guillena | Yudivián Almeida Cruz

This paper introduces a web demo that showcases the main characteristics of the AutoGOAL framework. AutoGOAL is a framework in Python for automatically finding the best way to solve a given task. It has been designed mainly for automatic machine learning(AutoML) but it can be used in any scenario where several possible strategies are available to solve a given computational task. In contrast with alternative frameworks, AutoGOAL can be applied seamlessly to Natural Language Processing as well as structured classification problems. This paper presents an overview of the framework’s design and experimental evaluation in several machine learning problems, including two recent NLP challenges. The accompanying software demo is available online ( and full source code is provided under the MIT open-source license (

Fast Word Predictor for On-Device Application
Huy Tien Nguyen | Khoi Tuan Nguyen | Anh Tuan Nguyen | Thanh Lac Thi Tran

Learning on large text corpora, deep neural networks achieve promising results in the next word prediction task. However, deploying these huge models on devices has to deal with constraints of low latency and a small binary size. To address these challenges, we propose a fast word predictor performing efficiently on mobile devices. Compared with a standard neural network which has a similar word prediction rate, the proposed model obtains 60% reduction in memory size and 100X faster inference time on a middle-end mobile device. The method is developed as a feature for a chat application which serves more than 100 million users.

Semantic search with domain-specific word-embedding and production monitoring in Fintech
Mojtaba Farmanbar | Nikki Van Ommeren | Boyang Zhao

We present an end-to-end information retrieval system with domain-specific custom language models for accurate search terms expansion. The text mining pipeline tackles several challenges faced in an industry-setting, including multi-lingual jargon-rich unstructured text and privacy compliance. Combined with a novel statistical approach for word embedding evaluations, the models can be monitored in a production setting. Our approach is used in the real world in risk management in the financial sector and has wide applicability to other domains.

CogniVal in Action: An Interface for Customizable Cognitive Word Embedding Evaluation
Nora Hollenstein | Adrian van der Lek | Ce Zhang

We demonstrate the functionalities of the new user interface for CogniVal. CogniVal is a framework for the cognitive evaluation of English word embeddings, which evaluates the quality of the embeddings based on their performance to predict human lexical representations from cognitive language processing signals from various sources. In this paper, we present an easy-to-use command line interface for CogniVal with multiple improvements over the original work, including the possibility to evaluate custom embeddings against custom cognitive data sources.

A Multilingual Reading Comprehension System for more than 100 Languages
Anthony Ferritto | Sara Rosenthal | Mihaela Bornea | Kazi Hasan | Rishav Chakravarti | Salim Roukos | Radu Florian | Avi Sil

This paper presents M-GAAMA, a Multilingual Question Answering architecture and demo system. This is the first multilingual machine reading comprehension (MRC) demo which is able to answer questions in over 100 languages. M-GAAMA answers questions from a given passage in the same or different language. It incorporates several existing multilingual models that can be used interchangeably in the demo such as M-BERT and XLM-R. The M-GAAMA demo also improves language accessibility by incorporating the IBM Watson machine translation widget to provide additional capabilities to the user to see an answer in their desired language. We also show how M-GAAMA can be used in downstream tasks by incorporating it into an END-TO-END-QA system using CFO (Chakravarti et al., 2019). We experiment with our system architecture on the Multi-Lingual Question Answering (MLQA) and the COVID-19 CORD (Wang et al., 2020; Tang et al., 2020) datasets to provide insights into the performance of the system.

XplaiNLI: Explainable Natural Language Inference through Visual Analytics
Aikaterini-Lida Kalouli | Rita Sevastjanova | Valeria de Paiva | Richard Crouch | Mennatallah El-Assady

Advances in Natural Language Inference (NLI) have helped us understand what state-of-the-art models really learn and what their generalization power is. Recent research has revealed some heuristics and biases of these models. However, to date, there is no systematic effort to capitalize on those insights through a system that uses these to explain the NLI decisions. To this end, we propose XplaiNLI, an eXplainable, interactive, visualization interface that computes NLI with different methods and provides explanations for the decisions made by the different approaches.

Discussion Tracker: Supporting Teacher Learning about Students’ Collaborative Argumentation in High School Classrooms
Luca Lugini | Christopher Olshefski | Ravneet Singh | Diane Litman | Amanda Godley

Teaching collaborative argumentation is an advanced skill that many K-12 teachers struggle to develop. To address this, we have developed Discussion Tracker, a classroom discussion analytics system based on novel algorithms for classifying argument moves, specificity, and collaboration. Results from a classroom deployment indicate that teachers found the analytics useful, and that the underlying classifiers perform with moderate to substantial agreement with humans.

An Online Readability Leveled Arabic Thesaurus
Zhengyang Jiang | Nizar Habash | Muhamed Al Khalil

This demo paper introduces the online Readability Leveled Arabic Thesaurus interface. For a given user input word, this interface provides the word’s possible lemmas, roots, English glosses, related Arabic words and phrases, and readability on a five-level readability scale. This interface builds on and connects multiple existing Arabic resources and processing tools. This one-of-a-kind system enables Arabic speakers and learners to benefit from advances in Arabic computational linguistics technologies. Feedback from users of the system will help the developers to identify lexical coverage gaps and errors. A live link to the demo is available at:

TrainX – Named Entity Linking with Active Sampling and Bi-Encoders
Tom Oberhauser | Tim Bischoff | Karl Brendel | Maluna Menke | Tobias Klatt | Amy Siu | Felix Alexander Gers | Alexander Löser

We demonstrate TrainX, a system for Named Entity Linking for medical experts. It combines state-of-the-art entity recognition and linking architectures, such as Flair and fine-tuned Bi-Encoders based on BERT, with an easy-to-use interface for healthcare professionals. We support medical experts in annotating training data by using active sampling strategies to forward informative samples to the annotator. We demonstrate that our model is capable of linking against large knowledge bases, such as UMLS (3.6 million entities), and supporting zero-shot cases, where the linker has never seen the entity before. Those zero-shot capabilities help to mitigate the problem of rare and expensive training data that is a common issue in the medical domain.

BullStop: A Mobile App for Cyberbullying Prevention
Semiu Salawu | Yulan He | Jo Lumsden

Social media has become the new playground for bullies. Young people are now regularly exposed to a wide range of abuse online. In response to the increasing prevalence of cyberbullying, online social networks have increased efforts to clamp down on online abuse but unfortunately, the nature, complexity and sheer volume of cyberbullying means that many cyberbullying incidents go undetected. BullStop is a mobile app for detecting and preventing cyberbullying and online abuse on social media platforms. It uses deep learning models to identify instances of cyberbullying and can automatically initiate actions such as deleting offensive messages and blocking bullies on behalf of the user. Our system not only achieves impressive prediction results but also demonstrates excellent potential for use in real-world scenarios and is freely available on the Google Play Store.

Annobot: Platform for Annotating and Creating Datasets through Conversation with a Chatbot
Rafał Poświata | Michał Perełkiewicz

In this paper, we introduce Annobot: a platform for annotating and creating datasets through conversation with a chatbot. This natural form of interaction has allowed us to create a more accessible and flexible interface, especially for mobile devices. Our solution has a wide range of applications such as data labelling for binary, multi-class/label classification tasks, preparing data for regression problems, or creating sets for issues such as machine translation, question answering or text summarization. Additional features include pre-annotation, active sampling, online learning and real-time inter-annotator agreement. The system is integrated with the popular messaging platform: Facebook Messanger. Usability experiment showed the advantages of the proposed platform compared to other labelling tools. The source code of Annobot is available under the GNU LGPL license at

Arabic Curriculum Analysis
Hamdy Mubarak | Shimaa Amer | Ahmed Abdelali | Kareem Darwish

Developing a platform that analyzes the content of curricula can help identify their shortcomings and whether they are tailored to specific desired outcomes. In this paper, we present a system to analyze Arabic curricula and provide insights into their content. It allows users to explore word presence, surface-forms used, as well as contrasting statistics between different countries from which the curricula were selected. Also, it provides a facility to grade text in reference to given grade-level and gives users feedback about the complexity or difficulty of words used in a text.

Epistolary Education in 21st Century: A System to Support Composition of E-mails by Students to Superiors in Japanese
Kenji Ryu | Michal Ptaszynski

E-mail is a communication tool widely used by people of all ages on the Internet today, often in business and formal situations, especially in Japan. Moreover, Japanese E-mail communication has a set of specific rules taught using specialized guidebooks. E-mail literacy education for many Japanese students is typically provided in a traditional, yet inefficient lecture-based way. We propose a system to support Japanese students in writing E-mails to superiors (teachers, job hunting representatives, etc.). We firstly make an investigation into the importance of formal E-mails in Japan, and what is needed to successfully write a formal E-mail. Next, we develop the system with accordance to those rules. Finally, we evaluated the system twofold. The results, although performed on a small number of samples, were generally positive, and clearly indicated additional ways to improve the system.