Andrea Nini
2026
A Multi-Dialectal, Longitudinal Corpus of Human-AI Hybrid Language Production
Qiao Gan | Jonathan Dunn | Andrea Nini | Benjamin Adams
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Qiao Gan | Jonathan Dunn | Andrea Nini | Benjamin Adams
Proceedings of the Fifteenth Language Resources and Evaluation Conference
This paper presents a multi-dialectal, longitudinal corpus of human-AI hybrid language production, comprising purely human-written texts, purely LLM-generated texts, and hybrid texts produced under different LLM-assistance modes (e.g., stylistic suggestions, short continuations, partial essay generation). The corpus includes 693 participants from five national English dialects, with natural and hybrid samples paired within individuals over a four-week period. This design enables investigation of both short- and longer-term effects of LLM assistance on language use across geographic and social contexts. To illustrate the corpus’s utility, we analyze linguistic features across three dimensions: lexical diversity, syntactic complexity, and stylistic variation. The results show that LLM assistance enhances lexical diversity without a corresponding increase in syntactic complexity, revealing distinct effects across linguistic dimensions. Overall, this corpus offers a valuable resource for studying human-AI interaction, dialectal variation, and the influence of AI assistance on written language.
2021
Production vs Perception: The Role of Individuality in Usage-Based Grammar Induction
Jonathan Dunn | Andrea Nini
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Jonathan Dunn | Andrea Nini
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
This paper asks whether a distinction between production-based and perception-based grammar induction influences either (i) the growth curve of grammars and lexicons or (ii) the similarity between representations learned from independent sub-sets of a corpus. A production-based model is trained on the usage of a single individual, thus simulating the grammatical knowledge of a single speaker. A perception-based model is trained on an aggregation of many individuals, thus simulating grammatical generalizations learned from exposure to many different speakers. To ensure robustness, the experiments are replicated across two registers of written English, with four additional registers reserved as a control. A set of three computational experiments shows that production-based grammars are significantly different from perception-based grammars across all conditions, with a steeper growth curve that can be explained by substantial inter-individual grammatical differences.