Mariano Felice

2024

pdf abs
The British Council submission to the BEA 2024 shared task
Mariano Felice | Zeynep Duran Karaoz
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

This paper describes our submission to the item difficulty prediction track of the BEA 2024 shared task. Our submission included the output of three systems: 1) a feature-based linear regression model, 2) a RoBERTa-based model and 3) a linear regression ensemble built on the predictions of the two previous models. Our systems ranked 7th, 8th and 5th respectively, demonstrating that simple models can achieve optimal results. A closer look at the results shows that predictions are more accurate for items in the middle of the difficulty range, with no other obvious relationships between difficulty and the accuracy of predictions.

pdf abs
Distractor Generation Using Generative and Discriminative Capabilities of Transformer-based Models
Shiva Taslimipoor | Luca Benedetto | Mariano Felice | Paula Buttery
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Multiple Choice Questions (MCQs) are very common in both high-stakes and low-stakes examinations, and their effectiveness in assessing students relies on the quality and diversity of distractors, which are the incorrect answer options provided alongside the correct answer. Motivated by the progress in generative language models, we propose a two-step automatic distractor generation approach which is based on text to text transfer transformer models. Unlike most of previous methods for distractor generation, our approach does not rely on the correct answer options. Instead, it first generates both correct and incorrect answer options, and then discriminates potential correct options from distractors. Identified distractors are finally categorised based on semantic similarity scores into separate clusters, and the cluster heads are selected as our final distinct distractors. Experiments on two publicly available datasets show that our approach outperforms previous models both in the case of single-word answer options and longer-sequence reading comprehension questions.

pdf abs
Language Variety Identification with True Labels
Marcos Zampieri | Kai North | Tommi Jauhiainen | Mariano Felice | Neha Kumari | Nishant Nair | Yash Mahesh Bangera
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Language identification is an important first step in many NLP applications. Most publicly available language identification datasets, however, are compiled under the assumption that the gold label of each instance is determined by where texts are retrieved from. Research has shown that this is a problematic assumption, particularly in the case of very similar languages (e.g., Croatian and Serbian) and national language varieties (e.g., Brazilian and European Portuguese), where texts may contain no distinctive marker of the particular language or variety. To overcome this important limitation, this paper presents DSL True Labels (DSL-TL), the first human-annotated multilingual dataset for language variety identification. DSL-TL contains a total of 12,900 instances in Portuguese, split between European Portuguese and Brazilian Portuguese; Spanish, split between Argentine Spanish and Castilian Spanish; and English, split between American English and British English. We trained multiple models to discriminate between these language varieties, and we present the results in detail. The data and models presented in this paper provide a reliable benchmark toward the development of robust and fairer language variety identification systems. We make DSL-TL freely available to the research community.

2022

pdf abs
Constructing Open Cloze Tests Using Generation and Discrimination Capabilities of Transformers
Mariano Felice | Shiva Taslimipoor | Paula Buttery
Findings of the Association for Computational Linguistics: ACL 2022

This paper presents the first multi-objective transformer model for generating open cloze tests that exploits generation and discrimination capabilities to improve performance. Our model is further enhanced by tweaking its loss function and applying a post-processing re-ranking algorithm that improves overall test structure. Experiments using automatic and human evaluation show that our approach can achieve up to 82% accuracy according to experts, outperforming previous work and baselines. We also release a collection of high-quality open cloze tests along with sample system output and human annotations that can serve as a future benchmark.

pdf abs
CEPOC: The Cambridge Exams Publishing Open Cloze dataset
Mariano Felice | Shiva Taslimipoor | Øistein E. Andersen | Paula Buttery
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Open cloze tests are a standard type of exercise where examinees must complete a text by filling in the gaps without any given options to choose from. This paper presents the Cambridge Exams Publishing Open Cloze (CEPOC) dataset, a collection of open cloze tests from world-renowned English language proficiency examinations. The tests in CEPOC have been expertly designed and validated using standard principles in language research and assessment. They are prepared for language learners at different proficiency levels and hence classified into different CEFR levels (A2, B1, B2, C1, C2). This resource can be a valuable testbed for various NLP tasks. We perform a complete set of experiments on three tasks: gap filling, gap prediction, and CEFR text classification. We implement transformer-based systems based on pre-trained language models to model each task and use our dataset as a test set, providing promising benchmark results.

2020

pdf abs
A Crash Course in Automatic Grammatical Error Correction
Roman Grundkiewicz | Christopher Bryant | Mariano Felice
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting all types of errors in written text. Although most research has focused on correcting errors in the context of English as a Second Language (ESL), GEC can also be applied to other languages and native text. The main application of a GEC system is thus to assist humans with their writing. Academic and commercial interest in GEC has grown significantly since the Helping Our Own (HOO) and Conference on Natural Language Learning (CoNLL) shared tasks in 2011-14, and a record-breaking 24 teams took part in the recent Building Educational Applications (BEA) shared task. Given this interest, and the recent shift towards neural approaches, we believe the time is right to offer a tutorial on GEC for researchers who may be new to the field or who are interested in the current state of the art and future challenges. With this in mind, the main goal of this tutorial is not only to bring attendees up to speed with GEC in general, but also examine the development of neural-based GEC systems.

2019

pdf abs
The BEA-2019 Shared Task on Grammatical Error Correction
Christopher Bryant | Mariano Felice | Øistein E. Andersen | Ted Briscoe
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper reports on the BEA-2019 Shared Task on Grammatical Error Correction (GEC). As with the CoNLL-2014 shared task, participants are required to correct all types of errors in test data. One of the main contributions of the BEA-2019 shared task is the introduction of a new dataset, the Write&Improve+LOCNESS corpus, which represents a wider range of native and learner English levels and abilities. Another contribution is the introduction of tracks, which control the amount of annotated data available to participants. Systems are evaluated in terms of ERRANT F_0.5, which allows us to report a much wider range of performance statistics. The competition was hosted on Codalab and remains open for further submissions on the blind test set.

pdf abs
Entropy as a Proxy for Gap Complexity in Open Cloze Tests
Mariano Felice | Paula Buttery
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper presents a pilot study of entropy as a measure of gap complexity in open cloze tests aimed at learners of English. Entropy is used to quantify the information content in each gap, which can be used to estimate complexity. Our study shows that average gap entropy correlates positively with proficiency levels while individual gap entropy can capture contextual complexity. To the best of our knowledge, this is the first unsupervised information-theoretical approach to evaluating the quality of cloze tests.

2017

pdf abs
Artificial Error Generation with Machine Translation and Syntactic Patterns
Marek Rei | Mariano Felice | Zheng Yuan | Ted Briscoe
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Shortage of available training data is holding back progress in the area of automated error detection. This paper investigates two alternative methods for artificially generating writing errors, in order to create additional resources. We propose treating error generation as a machine translation task, where grammatically correct text is translated to contain errors. In addition, we explore a system for extracting textual patterns from an annotated corpus, which can then be used to insert errors into grammatically correct sentences. Our experiments show that the inclusion of artificially generated errors significantly improves error detection accuracy on both FCE and CoNLL 2014 datasets.

pdf abs
Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction
Christopher Bryant | Mariano Felice | Ted Briscoe
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Until now, error type performance for Grammatical Error Correction (GEC) systems could only be measured in terms of recall because system output is not annotated. To overcome this problem, we introduce ERRANT, a grammatical ERRor ANnotation Toolkit designed to automatically extract edits from parallel original and corrected sentences and classify them according to a new, dataset-agnostic, rule-based framework. This not only facilitates error type evaluation at different levels of granularity, but can also be used to reduce annotator workload and standardise existing GEC datasets. Human experts rated the automatic edits as “Good” or “Acceptable” in at least 95% of cases, so we applied ERRANT to the system output of the CoNLL-2014 shared task to carry out a detailed error type analysis for the first time.

2016

pdf
Candidate re-ranking for SMT-based grammatical error correction
Zheng Yuan | Ted Briscoe | Mariano Felice
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf abs
Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments
Mariano Felice | Christopher Bryant | Ted Briscoe
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We propose a new method of automatically extracting learner errors from parallel English as a Second Language (ESL) sentences in an effort to regularise annotation formats and reduce inconsistencies. Specifically, given an original and corrected sentence, our method first uses a linguistically enhanced alignment algorithm to determine the most likely mappings between tokens, and secondly employs a rule-based function to decide which alignments should be merged. Our method beats all previous approaches on the tested datasets, achieving state-of-the-art results for automatic error extraction.

Mariano Felice

2024

2022

2020

2019

2017

2016

2015

2014

2013

2012

Co-authors

Venues