Michael Bennie
2026
A Dataset for Oral Reading in Young English Readers
Madison Rose | Michael Bennie | Valeria Pagliai | Hatice Kubra Karakis | Qian Shen | Xinyi Tai | Walter L. Leite | Zoey Liu
Proceedings of the 30th Conference on Computational Natural Language Learning
Madison Rose | Michael Bennie | Valeria Pagliai | Hatice Kubra Karakis | Qian Shen | Xinyi Tai | Walter L. Leite | Zoey Liu
Proceedings of the 30th Conference on Computational Natural Language Learning
Among English child speech corpora, very few focus on oral reading. Existing resources such as the CMU Kids Corpus (Ellis Weismer et al., 2013) face limitations in the lack of grade-appropriate, curriculum-aligned reading texts, the annotation scope and quality, and most crucially, comprehensive annotation scheme for characterization of children’s reading errors. This study presents a multi-layered, fully manually annotated corpus of oral reading from 63 1st-3rd grade students residing in the U.S. who grow up hearing and speaking English. Additionally, we contribute methodologically rigorous annotation guidelines that categorize 10 reading error categories and 26 sublevel error labels. Using a digital reading platform supported by GPT-4o-mini (OpenAI, 2024), children read stories on topics of their own interest, while the system records their speech and logs their interactions with embedded digital supports. Each recording is paired with detailed demographic and educational metadata and subjected to linguistic annotations, including: (1) sentence- and word-level time alignment; (2) phonemic transcription; (3) reading errors.
2025
CODEOFCONDUCT at Multilingual Counterspeech Generation: A Context-Aware Model for Robust Counterspeech Generation in Low-Resource Languages
Michael Bennie | Bushi Xiao | Chryseis Xinyi Liu | Demi Zhang | Jian Meng | Alayo Tripp
Proceedings of the First Workshop on Multilingual Counterspeech Generation
Michael Bennie | Bushi Xiao | Chryseis Xinyi Liu | Demi Zhang | Jian Meng | Alayo Tripp
Proceedings of the First Workshop on Multilingual Counterspeech Generation
This paper introduces a context-aware model for robust counterspeech generation, which achieved significant success in the MCG-COLING-2025 shared task. Our approach particularly excelled in low-resource language settings. By leveraging a simulated annealing algorithm fine-tuned on multilingual datasets, the model generates factually accurate responses to hate speech. We demonstrate state-of-the-art performance across four languages (Basque, English, Italian, and Spanish), with our system ranking first for Basque, second for Italian, and third for both English and Spanish. Notably, our model swept all three top positions for Basque, highlighting its effectiveness in low-resource scenarios. Evaluation of the shared task employs both traditional metrics (BLEU, ROUGE, BERTScore, Novelty) and the LLM-based JudgeLM. We present a detailed analysis of our results, including error cases and potential improvements. This work contributes to the growing body of research on multilingual counterspeech generation, offering insights into developing robust models that can adapt to diverse linguistic and cultural contexts in the fight against online hate speech.
PANDA - Paired Anti-hate Narratives Dataset from Asia: Using an LLM-as-a-Judge to Create the First Chinese Counterspeech Dataset
Michael Bennie | Demi Zhang | Bushi Xiao | Jing Cao | Chryseis Xinyi Liu | Jian Meng | Alayo Tripp
Proceedings of the First Workshop on Multilingual Counterspeech Generation
Michael Bennie | Demi Zhang | Bushi Xiao | Jing Cao | Chryseis Xinyi Liu | Jian Meng | Alayo Tripp
Proceedings of the First Workshop on Multilingual Counterspeech Generation
Despite the global prevalence of Modern Standard Chinese language, counterspeech (CS) resources for Chinese remain virtually nonexistent. To address this gap in East Asian counterspeech research we introduce the a corpus of Modern Standard Mandarin counterspeech that focuses on combating hate speech in Mainland China. This paper proposes a novel approach of generating CS by using an LLM-as-a-Judge, simulated annealing, LLMs zero-shot CN generation and a round-robin algorithm. This is followed by manual verification for quality and contextual relevance. This paper details the methodology for creating effective counterspeech in Chinese and other non-Eurocentric languages, including unique cultural patterns of which groups are maligned and linguistic patterns in what kinds of discourse markers are programmatically marked as hate speech (HS). Analysis of the generated corpora, we provide strong evidence for the lack of open-source, properly labeled Chinese hate speech data and the limitations of using an LLM-as-Judge to score possible answers in Chinese. Moreover, the present corpus servers as the first East Asian language based CS corpus and provides an essential resource for future research on counterspeech generation and evaluation.