Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Danilo Croce, Luca Soldaini (Editors)

Anthology ID:
Dubrovnik, Croatia
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Danilo Croce | Luca Soldaini

pdf bib
Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains
Alon Albalak | Sharon Levy | William Yang Wang

Open-retrieval question answering systems are generally trained and tested on large datasets in well-established domains. However, low-resource settings such as new and emerging domains would especially benefit from reliable question answering systems. Furthermore, multilingual and cross-lingual resources in emergent domains are scarce, leading to few or no such systems.In this paper, we demonstrate a cross-lingual open-retrieval question answering system for the emergent domain of COVID-19.Our system adopts a corpus of scientific articles to ensure that retrieved documents are reliable. To address the scarcity of cross-lingual training data in emergent domains, we present a method utilizing automatic translation, alignment, and filtering to produce English-to-all datasets.We show that a deep semantic retriever greatly benefits from training on our English-to-all data and significantly outperforms a BM25 baseline in the cross-lingual setting.We illustrate the capabilities of our system with examples and release all code necessary to train and deploy such a system.

pdf bib
CodeAnno: Extending WebAnno with Hierarchical Document Level Annotation and Automation
Florian Schneider | Seid Muhie Yimam | Fynn Petersen-frey | Gerret Von Nordheim | Katharina Kleinen-von K”onigsl”ow | Chris Biemann

WebAnno is one of the most popular annotation tools that supports generic annotation types and distributive annotation with multiple user roles. However, WebAnno focuses on annotating span-level mentions and relations among them, making document-level annotation complicated. When it comes to the annotation and analysis of social science materials, it usually involves the creation of codes to categorize a given document. The codes, which are known as codebooks, are typically hierarchical, which enables to code the document either with a general category or more fine-grained subcategories. CodeAnno is forked from WebAnno and designed to solve the coding problems faced by many social science researchers with the following main functionalities. 1) Creation of hierarchical codebooks, with functionality to move and sort categories in the hierarchy 2) an interactive UI for codebook annotation 3) import and export of annotations in CSV format, hence being compatible with existing annotations conducted using spreadsheet applications 4) integration of an external automation component to facilitate coding using machine learning 5) project templating that allows duplicating a project structure without copying the actual documents. We present different use-cases to demonstrate the capability of CodeAnno. A shot demonstration video of the system is available here:

NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools
Peiran Yao | Matej Kosmajac | Abeer Waheed | Kostyantyn Guzhva | Natalie Hervieux | Denilson Barbosa

NLP Workbench is a web-based platform for text mining that allows non-expert users to obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and open source systems from academia that provide semantic analysis functionalities, including but not limited to entity linking, sentiment analysis, semantic parsing, and relation extraction. Its extensible design enables researchers and developers to smoothly replace an existing model or integrate a new one. To improve efficiency, we employ a microservice architecture that facilitates allocation of acceleration hardware and parallelization of computation. This paper presents the architecture of NLP Workbench and discusses the challenges we faced in designing it. We also discuss diverse use cases of NLP Work- bench and the benefits of using it over other approaches. The platform is under active devel- opment, with its source code released under the MIT license. A website and a short video demonstrating our platform are also available.

jTLEX: a Java Library for TimeLine EXtraction
Mustafa Ocal | Akul Singh | Jared Hummer | Antonela Radas | Mark Finlayson

jTLEX is a programming library that provides a Java implementation of the TimeLine EXtraction algorithm (TLEX; Finlayson et al.,2021), along with utilities for programmatic manipulation of TimeML graphs. Timelines are useful for a number of natural language understanding tasks, such as question answering, cross-document event coreference, and summarization & visualization. jTLEX provides functionality for (1) parsing TimeML annotations into Java objects, (2) construction of TimeML graphs from scratch, (3) partitioning of TimeML graphs into temporally connected subgraphs, (4) transforming temporally connected subgraphs into point algebra (PA) graphs, (5) extracting exact timeline of TimeML graphs, (6) detecting inconsistent subgraphs, and (7) calculating indeterminate sections of the timeline. The library has been tested on the entire TimeBank corpus, and comes with a suite of unit tests. We release the software as open source with a free license for non-commercial use.

CovRelex-SE: Adding Semantic Information for Relation Search via Sequence Embedding
Truong Do | Chau Nguyen | Vu Tran | Ken Satoh | Yuji Matsumoto | Minh Nguyen

In recent years, COVID-19 has impacted all aspects of human life. As a result, numerous publications relating to this disease have been issued. Due to the massive volume of publications, some retrieval systems have been developed to provide researchers with useful information. In these systems, lexical searching methods are widely used, which raises many issues related to acronyms, synonyms, and rare keywords. In this paper, we present a hybrid relation retrieval system, CovRelex-SE, based on embeddings to provide high-quality search results. Our system can be accessed through the following URL:

ITMT: Interactive Topic Model Trainer
Lorena Calvo Bartolomé | José Antonio Espinosa Melchor | Jerónimo Arenas-garcía

Topic Modeling is a commonly used technique for analyzing unstructured data in various fields, but achieving accurate results and useful models can be challenging, especially for domain experts who lack the knowledge needed to optimize the parameters required by this natural language processing technique. From this perspective, we introduce an Interactive Topic Model Trainer (ITMT) developed within the EU-funded project IntelComp. ITMT is a user-in-the-loop topic modeling tool presented with a graphical user interface that allows the training and curation of different state-of-the-art topic extraction libraries, including some recent neural-based methods, oriented toward the usage by domain experts. This paper reviews ITMT’s functionalities and key implementation aspects in this paper, including a comparison with other tools for topic modeling analysis.

FISH: A Financial Interactive System for Signal Highlighting
Ta-wei Huang | Jia-huei Ju | Yu-shiang Huang | Cheng-wei Lin | Yi-shyuan Chiang | Chuan-ju Wang

In this system demonstration, we seek to streamline the process of reviewing financial statements and provide insightful information for practitioners. We develop FISH, an interactive system that extracts and highlights crucial textual signals from financial statements efficiently and precisely. To achieve our goal, we integrate pre-trained BERT representations and a fine-tuned BERT highlighting model with a newly-proposed two-stage classify-then-highlight pipeline. We also conduct the human evaluation, showing FISH can provide accurate financial signals. FISH overcomes the limitations of existing research andmore importantly benefits both academics and practitioners in finance as they can leverage state-of-the-art contextualized language models with their newly gained insights. The system is available online at, and a short video for introduction is at

Yu Sheng: Human-in-Loop Classical Chinese Poetry Generation System
Jingkun Ma | Runzhe Zhan | Derek F. Wong

The development of poetry generation system mainly focuses on enhancing the capacity of generation model. However, the demands of customization and polishing are generally ignored, which highly reduces the scope of application. In this work, we present Yu Sheng, a web-based poetry generation system that is featured a human-in-loop generation framework, providing various customization options for users with different backgrounds to engage in the process of poetry composition. To this end, we propose two methods and train the models that can perform constrained generation and fine-grained polishing. The automatic and human evaluation results show that our system has a strong ability to generate and polish poetry compared to other vanilla models. Our system is publicly accessible at:

PANACEA: An Automated Misinformation Detection System on COVID-19
Runcong Zhao | Miguel Arana-catania | Lixing Zhu | Elena Kochkina | Lin Gui | Arkaitz Zubiaga | Rob Procter | Maria Liakata | Yulan He

In this demo, we introduce a web-based misinformation detection system PANACEA on COVID-19 related claims, which has two modules, fact-checking and rumour detection. Our fact-checking module, which is supported by novel natural language inference methods with a self-attention network, outperforms state-of-the-art approaches. It is also able to give automated veracity assessment and ranked supporting evidence with the stance towards the claim to be checked. In addition, PANACEA adapts the bi-directional graph convolutional networks model, which is able to detect rumours based on comment networks of related tweets, instead of relying on the knowledge base. This rumour detection module assists by warning the users in the early stages when a knowledge base may not be available.

NxPlain: A Web-based Tool for Discovery of Latent Concepts
Fahim Dalvi | Nadir Durrani | Hassan Sajjad | Tamim Jaban | Mus’ab Husaini | Ummar Abbas

The proliferation of deep neural networks in various domains has seen an increased need for the interpretability of these models, especially in scenarios where fairness and trust are as important as model performance. A lot of independent work is being carried out to: i) analyze what linguistic and non-linguistic knowledge is learned within these models, and ii) highlight the salient parts of the input. We present NxPlain, a web-app that provides an explanation of a model’s prediction using latent concepts. NxPlain discovers latent concepts learned in a deep NLP model, provides an interpretation of the knowledge learned in the model, and explains its predictions based on the used concepts. The application allows users to browse through the latent concepts in an intuitive order, letting them efficiently scan through the most salient concepts with a global corpus-level view and a local sentence-level view. Our tool is useful for debugging, unraveling model bias, and for highlighting spurious correlations in a model. A hosted demo is available here:

Small-Text: Active Learning for Text Classification in Python
Christopher Schröder | Lydia Müller | Andreas Niekler | Martin Potthast

We introduce small-text, an easy-to-use active learning library, which offers pool-based active learning for single- and multi-label text classification in Python. It features numerous pre-implemented state-of-the-art query strategies, including some that leverage the GPU. Standardized interfaces allow the combination of a variety of classifiers, query strategies, and stopping criteria, facilitating a quick mix and match, and enabling a rapid development of both active learning experiments and applications. With the objective of making various classifiers and query strategies accessible for active learning, small-text integrates several well-known machine learning libraries, namely scikit-learn, Pytorch, and Hugging Face transformers. The latter integrations are optionally installable extensions, so GPUs can be used but are not required. Using this new library, we investigate the performance of the recently published SetFit training paradigm, which we compare to vanilla transformer fine-tuning, finding that it matches the latter in classification accuracy while outperforming it in area under the curve. The library is available under the MIT License at, in version 1.3.0 at the time of writing.

kogito: A Commonsense Knowledge Inference Toolkit
Mete Ismayilzada | Antoine Bosselut

In this paper, we present kogito, an open-source tool for generating commonsense inferences about situations described in text. kogito provides an intuitive and extensible interface to interact with natural language generation models that can be used for hypothesizing commonsense knowledge inference from a textual input. In particular, kogito offers several features for targeted, multi-granularity knowledge generation. These include a standardized API for training and evaluating knowledge models, and generating and filtering inferences from them. We also include helper functions for converting natural language texts into a format ingestible by knowledge models — intermediate pipeline stages such as knowledge head extraction from text, heuristic and model-based knowledge head-relation matching, and an ability to define and use custom knowledge relations. We make the code for kogito available at along with thorough documentation at

Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation
Fantine Huot | Joshua Maynez | Shashi Narayan | Reinald Kim Amplayo | Kuzman Ganchev | Annie Priyadarshini Louis | Anders Sandholm | Dipanjan Das | Mirella Lapata

While conditional generation models can now generate natural language well enough to create fluent text, it is still difficult to control the generation process, leading to irrelevant, repetitive, and hallucinated content. Recent work shows that planning can be a useful intermediate step to render conditional generation less opaque and more grounded. We present a web browser-based demonstration for query-focused summarization that uses a sequence of question-answer pairs, as a blueprint plan for guiding text generation (i.e., what to say and in what order). We illustrate how users may interact with the generated text and associated plan visualizations, e.g., by editing and modifying the plan in order to improve or control the generated output.A short video demonstrating our system is available at

ALAMBIC : Active Learning Automation Methods to Battle Inefficient Curation
Charlotte Nachtegael | Jacopo De Stefani | Tom Lenaerts

In this paper, we present ALAMBIC, an open-source dockerized web-based platform for annotating text data through active learning for classification task. Active learning is known to reduce the need of labelling, a time-consuming task, by selecting the most informative instances among the unlabelled instances, reaching an optimal accuracy faster than by just randomly labelling data. ALAMBIC integrates all the steps from data import to customization of the (active) learning process and annotation of the data, with indications of the progress of the trained model that can be downloaded and used in downstream tasks. Its architecture also allows the easy integration of other types of model, features and active learning strategies.The code is available on and a video demonstration is available on

SPINDLE: Spinning Raw Text into Lambda Terms with Graph Attention
Konstantinos Kogkalidis | Michael Moortgat | Richard Moot

This paper describes SPINDLE, an open source Python module, providing an efficient and accurate parser for written Dutch that transforms raw text input to programs for meaning composition expressed as λ terms. The parser integrates a number of breakthrough advances made in recent years. Its output consists of hi-res derivations of a multimodal type-logical grammar, capturing two orthogonal axes of syntax, namely deep function-argument structures and dependency relations. These are produced by three interdependent systems: a static type-checker asserting the well-formedness of grammatical analyses, a state-of-the-art, structurally-aware supertagger based on heterogeneous graph convolutions, and a massively parallel proof search component based on Sinkhorn iterations. Packed in the software are also handy utilities and extras for proof visualization and inference, intended to facilitate end-user utilization.

Linguistic Constructs Represent the Domain Model in Intelligent Language Tutoring
Anisia Katinskaia | Jue Hou | Anh-duc Vu | Roman Yangarber

This paper presents the development of the AI-based language-learning platform, Revita. It is an intelligent online tutor, developed to support learners of multiple languages, from lower-intermediate toward advanced levels. It has been in pilot use with hundreds of students at several universities, whose feedback and needs shape the development. One of the main emerging features of Revita is the system of linguistic constructs to represent the domain knowledge. The system of constructs is developed in collaboration with experts in language pedagogy. Constructs define the types of exercises, the content of the feedback, and enable detailed modeling and evaluation of learner progress.

GATE Teamware 2: An open-source tool for collaborative document classification annotation
David Wilby | Twin Karmakharm | Ian Roberts | Xingyi Song | Kalina Bontcheva

We present GATE Teamware 2: an open-source web-based platform for managing teams of annotators working on document classification tasks. GATE Teamware 2 is an entirely re-engineered successor to GATE Teamware, using contemporary web frameworks. The software allows the management of teams of multiple annotators, project managers and administrators - including the management of annotators - across multiple projects. Projects can be configured to control and monitor the annotation statistics and have a highly flexible JSON-configurable annotation display which can include arbitrary HTML. Optionally, documents can be uploaded with pre-existing annotations and documents are served to annotators in a random order by default to reduce bias. Crucially, annotators can be trained on applying the annotation guidelines correctly and then screened for quality assurance purposes, prior to being cleared for independent annotation. GATE Teamware 2 can be self-deployed, including in container orchestration environments, or provided as private, hosted cloud instances.GATE Teamware 2 is an open-source software and can be downloaded from demonstration video of the system has also been made available at

GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets
Njall Skarphedinsson | Breki Gudmundsson | Steinar Smari | Marta Kristin Larusdottir | Hafsteinn Einarsson | Abuzar Khan | Eric Nyberg | Hrafn Loftsson

The methods used to create many of the well-known Question-Answering (QA) datasets are hard to replicate for low-resource languages. A commonality amongst these methods is hiring annotators to source answers from the internet by querying a single answer source, such as Wikipedia. Applying these methods for low-resource languages can be problematic since there is no single large answer source for these languages. Consequently, this can result in a high ratio of unanswered questions, since the amount of information in any single source is limited. To address this problem, we developed a novel crowd-sourcing platform to gather multiple-domain QA data for low-resource languages. Our platform, which consists of a mobile app and a web API, gamifies the data collection process. We successfully released the app for Icelandic (a low-resource language with about 350,000 native speakers) to build a dataset which rivals large QA datasets for high-resource languages both in terms of size and ratio of answered questions. We have made the platform open source with instructions on how to localize and deploy it to gather data for other low-resource languages.

Towards Speech to Speech Machine Translation focusing on Indian Languages
Vandan Mujadia | Dipti Sharma

We introduce an SSMT (Speech to Speech Machine Translation, aka Speech to Speech Video Translation) Pipeline(, as web application for translating videos from one language to another by cascading multiple language modules. Our speech translation system combines highly accurate speech to text (ASR) for Indian English, pre-possessing modules to bridge ASR-MT gaps such as spoken disfluency and punctuation, robust machine translation (MT) systems for multiple language pairs, SRT module for translated text, text to speech (TTS) module and a module to render translated synthesized audio on the original video. It is user-friendly, flexible, and easily accessible system. We aim to provide a complete configurable speech translation experience to users and researchers with this system. It also supports human intervention where users can edit outputs of different modules and the edited output can then be used for subsequent processing to improve overall output quality. By adopting a human-in-the-loop approach, the aim is to configure technology in such a way where it can assist humans and help to reduce the involved human efforts in speech translation involving English and Indian languages. As per our understanding, this is the first fully integrated system for English to Indian languages (Hindi, Telugu, Gujarati, Marathi and Punjabi) video translation. Our evaluation shows that one can get 3.5+ MOS score using the developed pipeline with human intervention for English to Hindi. A short video demonstrating our system is available at

TextWorldExpress: Simulating Text Games at One Million Steps Per Second
Peter Jansen | Marc-alexandre Cote

Text-based games offer a challenging test bed to evaluate virtual agents at language understanding, multi-step problem-solving, and common-sense reasoning. However, speed is a major limitation of current text-based games, capping at 300 steps per second, mainly due to the use of legacy tooling. In this work we present TextWorldExpress, a high-performance simulator that includes implementations of three common text game benchmarks that increases simulation throughput by approximately three orders of magnitude, reaching over one million steps per second on common desktop hardware. This significantly reduces experiment runtime, enabling billion-step-scale experiments in about one day.

TermoUD - a language-independent terminology extraction tool
Malgorzata Marciniak | Piotr Rychlik | Agnieszka Mykowiecka

The paper addresses TermoUD — a language-independent terminology extraction tool. Itsprevious version, i.e. TermoPL (Marciniak et al., 2016; Rychlik et al., 2022), uses languagedependent shallow grammar which selects candidate terms. The goal behind the development of TermoUD is to make the procedure as universal as possible, while taking care of the linguistic correctness of selected phrases. The tool is suitable for languages for which the Universal Dependencies (UD) parser exists. We describe a method of candidate term extraction based on UD POS tags and UD relations. The candidate ranking is performed by the C-value metric (contexts counting is adapted to the UD formalism), which doesn’t need any additional language resources. The performance of the tool has been tested on texts in English, French, Dutch, and Slovenian. The results are evaluated on the manually annotated datasets: ACTER, RD-TEC 2.0, GENIA and RSDO5, and compared to those obtained by other tools.

INCOGNITUS: A Toolbox for Automated Clinical Notes Anonymization
Bruno Ribeiro | Vitor Rolla | Ricardo Santos

Automated text anonymization is a classical problem in Natural Language Processing (NLP). The topic has evolved immensely throughout the years, with the first list-search and rule-based solutions evolving to statistical modeling approaches and later to advanced systems that rely on powerful state-of-the-art language models. Even so, these solutions fail to be widely implemented in the most privacy-demanding areas of activity, such as healthcare; none of them is perfect, and most can not guarantee rigorous anonymization. This paper presents INCOGNITUS, a flexible platform for the automated anonymization of clinical notes that offers the possibility of applying different techniques. The available tools include an underexplored yet promising method that guarantees 100% recall by replacing each word with a semantically identical one. In addition, the presented framework incorporates a performance evaluation module to compute a novel metric for information loss assessment in real-time.

CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification
Seungone Kim | Se June Joo | Yul Jang | Hyungjoo Chae | Jinyoung Yeo

Chain-of-thought (CoT) prompting enables large language models (LLMs) to solve complex reasoning tasks by generating an explanation before the final prediction. Despite it’s promising ability, a critical downside of CoT prompting is that the performance is greatly affected by the factuality of the generated explanation. To improve the correctness of the explanations, fine-tuning language models with explanation data is needed. However, there exists only a few datasets that can be used for such approaches, and no data collection tool for building them. Thus, we introduce CoTEVer, a tool-kit for annotating the factual correctness of generated explanations and collecting revision data of wrong explanations. Furthermore, we suggest several use cases where the data collected with CoTEVer can be utilized for enhancing the faithfulness of explanations. Our toolkit is publicly available at

OLEA: Tool and Infrastructure for Offensive Language Error Analysis in English
Marie Grace | Jay Seabrum | Dananjay Srinivas | Alexis Palmer

State-of-the-art models for identifying offensive language often fail to generalize over more nuanced or implicit cases of offensive and hateful language. Understanding model performance on complex cases is key for building robust models that are effective in real-world settings. To help researchers efficiently evaluate their models, we introduce OLEA, a diagnostic, open-source, extensible Python library that provides easy-to-use tools for error analysis in the context of detecting offensive language in English. OLEA packages analyses and datasets proposed by prior scholarship, empowering researchers to build effective, explainable and generalizable offensive language classifiers.

TULAP - An Accessible and Sustainable Platform for Turkish Natural Language Processing Resources
Susan Uskudarli | Muhammet Şen | Furkan Akkurt | Merve Gürbüz | Onur Gungor | Arzucan Özgür | Tunga Güngör

Access to natural language processing resources is essential for their continuous improvement. This can be especially challenging in educational institutions where the software development effort required to package and release research outcomes may be overwhelming and under-recognized. Access towell-prepared and reliable research outcomes is important both for their developers as well as the greater research community. This paper presents an approach to address this concern with two main goals: (1) to create an open-source easily deployable platform where resources can be easily shared and explored, and (2) to use this platform to publish open-source Turkish NLP resources (datasets and tools) created by a research lab. The Turkish Natural Language Processing (TULAP) was designed and developed as an easy-to-use platform to share dataset and tool resources which supports interactive tool demos. Numerous open access Turkish NLP resources have been shared on TULAP. All tools are containerized to support portability for custom use. This paper describes the design, implementation, and deployment of TULAP with use cases (available at A short video demonstrating our system is available at

ALANNO: An Active Learning Annotation System for Mortals
Josip Jukić | Fran Jelenić | Miroslav Bićanić | Jan Snajder

Supervised machine learning has become the cornerstone of today’s data-driven society, increasing the need for labeled data. However, the process of acquiring labels is often expensive and tedious. One possible remedy is to use active learning (AL) – a special family of machine learning algorithms designed to reduce labeling costs. Although AL has been successful in practice, a number of practical challenges hinder its effectiveness and are often overlooked in existing AL annotation tools. To address these challenges, we developed ALANNO, an open-source annotation system for NLP tasks equipped with features to make AL effective in real-world annotation projects. ALANNO facilitates annotation management in a multi-annotator setup and supports a variety of AL methods and underlying models, which are easily configurable and extensible.

Automatically Summarizing Evidence from Clinical Trials: A Prototype Highlighting Current Challenges
Sanjana Ramprasad | Jered Mcinerney | Iain Marshall | Byron Wallace

In this work we present TrialsSummarizer, a system that aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query. Building on prior work, the system retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s), and ranks these according to sample size and estimated study quality.The top-k such studies are passed through a neural multi-document summarization system, yielding a synopsis of these trials. We consider two architectures: A standard sequence-to-sequence model based on BART, and a multi-headed architecture intended to provide greater transparency and controllability to end-users.Both models produce fluent and relevant summaries of evidence retrieved for queries, but their tendency to introduce unsupported statements render them inappropriate for use in this domain at present.The proposed architecture may help users verify outputs allowing users to trace generated tokens back to inputs. The demonstration video can be found at prototype, source code, and model weights are available at:

Corpus Annotation Graph Builder (CAG): An Architectural Framework to Create and Annotate a Multi-source Graph
Roxanne El Baff | Tobias Hecking | Andreas Hamm | Jasper W. Korte | Sabine Bartsch

Graphs are a natural representation of complex data as their structure allows users to discover (often implicit) relations among the nodes intuitively. Applications build graphs in an ad-hoc fashion, usually tailored to specific use cases, limiting their reusability. To account for this, we present the Corpus Annotation Graph (CAG) architectural framework based on a create-and-annotate pattern that enables users to build uniformly structured graphs from diverse data sources and extend them with automatically extracted annotations (e.g., named entities, topics). The resulting graphs can be used for further analyses across multiple downstream tasks (e.g., node classification). Code and resources are publicly available on GitHub, and downloadable via PyPi with the command {texttt{pip install cag}.

ferret: a Framework for Benchmarking Explainers on Transformers
Giuseppe Attanasio | Eliana Pastor | Chiara Di Bonaventura | Debora Nozza

As Transformers are increasingly relied upon to solve complex NLP problems, there is an increased need for their decisions to be humanly interpretable. While several explainable AI (XAI) techniques for interpreting the outputs of transformer-based models have been proposed, there is still a lack of easy access to using and comparing them.We introduce ferret, a Python library to simplify the use and comparisons of XAI methods on transformer-based classifiers.With ferret, users can visualize and compare transformers-based models output explanations using state-of-the-art XAI methods on any free-text or existing XAI corpora. Moreover, users can also evaluate ad-hoc XAI metrics to select the most faithful and plausible explanations. To align with the recently consolidated process of sharing and using transformers-based models from Hugging Face, ferret interfaces directly with its Python library.In this paper, we showcase ferret to benchmark XAI methods used on transformers for sentiment analysis and hate speech detection. We show how specific methods provide consistently better explanations and are preferable in the context of transformer models.

Learn With Martian: A Tool For Creating Assignments That Can Write And Re-Write Themselves
Shriyash Upadhyay | Chris Callison-burch | Etan Ginsberg

In this paper, we propose Learn, a unified, easy-to-use tool to apply question generation and selection in classrooms. The tool lets instructors and TAs create assignments that can write and re-write themselves. Given existing course materials, for example a reference textbook, Learn can generate questions, select the highest quality questions, show the questions to students, adapt question difficulty to student knowledge, and generate new questions based on how effectively old questions help students learn. The modular, composable nature of the tools for handling each sub-task allow instructors to use only the parts of the tool necessary to the course, allowing for integration in a large number of courses with varied teaching styles. We also report on the adoption of the tool in classes at the University of Pennsylvania with over 1000 students. Learn is publicly released at

EVALIGN: Visual Evaluation of Translation Alignment Models
Tariq Yousef | Gerhard Heyer | Stefan Jänicke

This paper presents EvAlign, a visual analytics framework for quantitative and qualitative evaluation of automatic translation alignment models. EvAlign offers various visualization views enabling developers to visualize their models’ predictions and compare the performance of their models with other baseline and state-of-the-art models. Through different search and filter functions, researchers and practitioners can also inspect the frequent alignment errors and their positions. EvAlign hosts nine gold standard datasets and the predictions of multiple alignment models. The tool is extendable, and adding additional datasets and models is straightforward. EvAlign can be deployed and used locally and is available on GitHub.

ALLECS: A Lightweight Language Error Correction System
Muhammad Reza Qorib | Geonsik Moon | Hwee Tou Ng

In this paper, we present ALLECS, a lightweight web application to serve grammatical error correction (GEC) systems so that they can be easily used by the general public. We design ALLECS to be accessible to as many users as possible, including users who have a slow Internet connection and who use mobile phones as their main devices to connect to the Internet. ALLECS provides three state-of-the-art base GEC systems using two approaches (sequence-to-sequence generation and sequence tagging), as well as two state-of-the-art GEC system combination methods using two approaches (edit-based and text-based). ALLECS can be accessed at

DAVE: Differential Diagnostic Analysis Automation and Visualization from Clinical Notes
Hadi Hamoud | Fadi Zaraket | Chadi Abou Chakra | Mira Dankar

The Differential Analysis Visualizer for Electronic Medical Records (DAVE) is a tool that utilizes natural language processing and machine learning to help visualize diagnostic algorithms in real-time to help support medical professionals in their clinical decision-making process