Aoife Cahill

2023

pdf abs
A New Task and Dataset on Detecting Attacks on Human Rights Defenders
Shihao Ran | Di Lu | Aoife Cahill | Joel Tetreault | Alejandro Jaimes
Findings of the Association for Computational Linguistics: ACL 2023

The ability to conduct retrospective analyses of attacks on human rights defenders over time and by location is important for humanitarian organizations to better understand historical or ongoing human rights violations and thus better manage the global impact of such events. We hypothesize that NLP can support such efforts by quickly processing large collections of news articles to detect and summarize the characteristics of attacks on human rights defenders. To that end, we propose a new dataset for detecting Attacks on Human Rights Defenders (HRDsAttack) consisting of crowdsourced annotations on 500 online news articles. The annotations include fine-grained information about the type and location of the attacks, as well as information about the victim(s). We demonstrate the usefulness of the dataset by using it to train and evaluate baseline models on several sub-tasks to predict the annotated characteristics.

2021

pdf abs
Supporting Spanish Writers using Automated Feedback
Aoife Cahill | James Bruno | James Ramey | Gilmar Ayala Meneses | Ian Blood | Florencia Tolentino | Tamar Lavee | Slava Andreyev
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

We present a tool that provides automated feedback to students studying Spanish writing. The feedback is given for four categories: topic development, coherence, writing conventions, and essay organization. The tool is made freely available via a Google Docs add-on. A small user study with third-level students in Mexico shows that students found the tool generally helpful and that most of them plan to continue using it as they work to improve their writing skills.

2020

pdf bib abs
Using PRMSE to evaluate automated scoring systems in the presence of label noise
Anastassia Loukina | Nitin Madnani | Aoife Cahill | Lili Yao | Matthew S. Johnson | Brian Riordan | Daniel F. McCaffrey
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

The effect of noisy labels on the performance of NLP systems has been studied extensively for system training. In this paper, we focus on the effect that noisy labels have on system evaluation. Using automated scoring as an example, we demonstrate that the quality of human ratings used for system evaluation have a substantial impact on traditional performance metrics, making it impossible to compare system evaluations on labels with different quality. We propose that a new metric, PRMSE, developed within the educational measurement community, can help address this issue, and provide practical guidelines on using PRMSE.

pdf abs
Context-based Automated Scoring of Complex Mathematical Responses
Aoife Cahill | James H Fife | Brian Riordan | Avijit Vajpayee | Dmytro Galochkin
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

The tasks of automatically scoring either textual or algebraic responses to mathematical questions have both been well-studied, albeit separately. In this paper we propose a method for automatically scoring responses that contain both text and algebraic expressions. Our method not only achieves high agreement with human raters, but also links explicitly to the scoring rubric – essentially providing explainable models and a way to potentially provide feedback to students in the future.

pdf abs
Don’t take “nswvtnvakgxpm” for an answer –The surprising vulnerability of automatic content scoring systems to adversarial input
Yuning Ding | Brian Riordan | Andrea Horbach | Aoife Cahill | Torsten Zesch
Proceedings of the 28th International Conference on Computational Linguistics

Automatic content scoring systems are widely used on short answer tasks to save human effort. However, the use of these systems can invite cheating strategies, such as students writing irrelevant answers in the hopes of gaining at least partial credit. We generate adversarial answers for benchmark content scoring datasets based on different methods of increasing sophistication and show that even simple methods lead to a surprising decrease in content scoring performance. As an extreme example, up to 60% of adversarial answers generated from random shuffling of words in real answers are accepted by a state-of-the-art scoring system. In addition to analyzing the vulnerabilities of content scoring systems, we examine countermeasures such as adversarial training and show that these measures improve system robustness against adversarial answers considerably but do not suffice to completely solve the problem.

2018

pdf abs
Atypical Inputs in Educational Applications
Su-Youn Yoon | Aoife Cahill | Anastassia Loukina | Klaus Zechner | Brian Riordan | Nitin Madnani
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

In large-scale educational assessments, the use of automated scoring has recently become quite common. While the majority of student responses can be processed and scored without difficulty, there are a small number of responses that have atypical characteristics that make it difficult for an automated scoring system to assign a correct score. We describe a pipeline that detects and processes these kinds of responses at run-time. We present the most frequent kinds of what are called non-scorable responses along with effective filtering models based on various NLP and speech processing technologies. We give an overview of two operational automated scoring systems —one for essay scoring and one for speech scoring— and describe the filtering models they use. Finally, we present an evaluation and analysis of filtering models used for spoken responses in an assessment of language proficiency.

pdf abs
Automated Scoring: Beyond Natural Language Processing
Nitin Madnani | Aoife Cahill
Proceedings of the 27th International Conference on Computational Linguistics

In this position paper, we argue that building operational automated scoring systems is a task that has disciplinary complexity above and beyond standard competitive shared tasks which usually involve applying the latest machine learning techniques to publicly available data in order to obtain the best accuracy. Automated scoring systems warrant significant cross-discipline collaboration of which natural language processing and machine learning are just two of many important components. Such systems have multiple stakeholders with different but valid perspectives that can often times be at odds with each other. Our position is that it is essential for us as NLP researchers to understand and incorporate these perspectives in our research and work towards a mutually satisfactory solution in order to build automated scoring systems that are accurate, fair, unbiased, and useful.

2017

pdf abs
Building Better Open-Source Tools to Support Fairness in Automated Scoring
Nitin Madnani | Anastassia Loukina | Alina von Davier | Jill Burstein | Aoife Cahill
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

Automated scoring of written and spoken responses is an NLP application that can significantly impact lives especially when deployed as part of high-stakes tests such as the GRE® and the TOEFL®. Ethical considerations require that automated scoring algorithms treat all test-takers fairly. The educational measurement community has done significant research on fairness in assessments and automated scoring systems must incorporate their recommendations. The best way to do that is by making available automated, non-proprietary tools to NLP researchers that directly incorporate these recommendations and generate the analyses needed to help identify and resolve biases in their scoring systems. In this paper, we attempt to provide such a solution.

pdf abs
Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks
Anastassia Loukina | Nitin Madnani | Aoife Cahill
Proceedings of the Workshop on Speech-Centric Natural Language Processing

We consider the automatic scoring of a task for which both the content of the response as well its spoken fluency are important. We combine features from a text-only content scoring system originally designed for written responses with several categories of acoustic features. Although adding any single category of acoustic features to the text-only system on its own does not significantly improve performance, adding all acoustic features together does yield a small but significant improvement. These results are consistent for responses to open-ended questions and to questions focused on some given source material.

Native Language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classification task where the set of L1s is known a priori. Two previous shared tasks on NLI have been organized where the aim was to identify the L1 of learners of English based on essays (2013) and spoken responses (2016) they provided during a standardized assessment of academic English proficiency. The 2017 shared task combines the inputs from the two prior tasks for the first time. There are three tracks: NLI on the essay only, NLI on the spoken response only (based on a transcription of the response and i-vector acoustic features), and NLI using both responses. We believe this makes for a more interesting shared task while building on the methods and results from the previous two shared tasks. In this paper, we report the results of the shared task. A total of 19 teams competed across the three different sub-tasks. The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy. Multiple classifier systems (e.g. ensembles and meta-classifiers) were the most effective in all tasks, with most based on traditional classifiers (e.g. SVMs) with lexical/syntactic features.

pdf abs
Investigating neural architectures for short answer scoring
Brian Riordan | Andrea Horbach | Aoife Cahill | Torsten Zesch | Chong Min Lee
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Neural approaches to automated essay scoring have recently shown state-of-the-art performance. The automated essay scoring task typically involves a broad notion of writing quality that encompasses content, grammar, organization, and conventions. This differs from the short answer content scoring task, which focuses on content accuracy. The inputs to neural essay scoring models – ngrams and embeddings – are arguably well-suited to evaluate content in short answer scoring tasks. We investigate how several basic neural approaches similar to those used for automated essay scoring perform on short answer scoring. We show that neural architectures can outperform a strong non-neural baseline, but performance and optimal parameter settings vary across the more diverse types of prompts typical of short answer scoring.

pdf abs
A Large Scale Quantitative Exploration of Modeling Strategies for Content Scoring
Nitin Madnani | Anastassia Loukina | Aoife Cahill
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

We explore various supervised learning strategies for automated scoring of content knowledge for a large corpus of 130 different content-based questions spanning four subject areas (Science, Math, English Language Arts, and Social Studies) and containing over 230,000 responses scored by human raters. Based on our analyses, we provide specific recommendations for content scoring. These are based on patterns observed across multiple questions and assessments and are, therefore, likely to generalize to other scenarios and prove useful to the community as automated content scoring becomes more popular in schools and classrooms.

Aoife Cahill

2023

2021

2020

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2002

Co-authors

Venues