2025
pdf
bib
abs
Evaluating Lexical Proficiency in Neural Language Models
Cristiano Ciaccio
|
Alessio Miaschi
|
Felice Dell’Orletta
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We present a novel evaluation framework designed to assess the lexical proficiency and linguistic creativity of Transformer-based Language Models (LMs). We validate the framework by analyzing the performance of a set of LMs of different sizes, in both mono- and multilingual configuration, across tasks involving the generation, definition, and contextual usage of lexicalized words, neologisms, and nonce words. To support these evaluations, we developed a novel dataset of lexical entries for the Italian language, including curated definitions and usage examples sourced from various online platforms. The results highlight the robustness and effectiveness of our framework in evaluating multiple dimensions of LMs’ linguistic understanding and offer an insight, through the assessment of their linguistic creativity, on the lexical generalization abilities of LMs.
pdf
bib
abs
Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors
Andrea Pedrotti
|
Michele Papucci
|
Cristiano Ciaccio
|
Alessio Miaschi
|
Giovanni Puccetti
|
Felice Dell’Orletta
|
Andrea Esuli
Findings of the Association for Computational Linguistics: ACL 2025
Recent advancements in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, raising concerns about the potential for malicious use, such as misinformation and manipulation. Moreover, detecting Machine-Generated Text (MGT) remains challenging due to the lack of robust benchmarks that assess generalization to real-world scenarios. In this work, we evaluate the resilience of state-of-the-art MGT detectors (e.g., Mage, Radar, LLM-DetectAIve) to linguistically informed adversarial attacks. We develop a pipeline that fine-tunes language models using Direct Preference Optimization (DPO) to shift the MGT style toward human-written text (HWT), obtaining generations more challenging to detect by current models. Additionally, we analyze the linguistic shifts induced by the alignment and how detectors rely on “linguistic shortcuts” to detect texts. Our results show that detectors can be easily fooled with relatively few examples, resulting in a significant drop in detecting performances. This highlights the importance of improving detection methods and making them robust to unseen in-domain texts. We release code, models, and data to support future research on more robust MGT detection benchmarks.
pdf
bib
abs
Beyond the Spelling Miracle: Investigating Substring Awareness in Character-Blind Language Models
Cristiano Ciaccio
|
Marta Sartor
|
Alessio Miaschi
|
Felice Dell’Orletta
Findings of the Association for Computational Linguistics: ACL 2025
Correctly identifying characters and substrings of words should be a basic but essential ability of any Language Model that aims to proficiently understand and produce language. Despite so, the majority of Pre-trained Language Models (PLMs) are “character-blind” and struggle in spelling tasks, although they still seem to acquire some character knowledge during pre-training, a phenomenon dubbed Spelling Miracle. To shed light on this phenomenon, we systematically evaluate a range of PLMs with different parameter sizes using a controlled binary substring identification task. Through a series of experiments, we propose the first comprehensive investigation on where, when, and how a PLMs develop awareness of characters and substrings, with a particular linguistic focus on morphemic units such as prefixes, suffixes, and roots.
2024
pdf
bib
abs
Controllable Text Generation to Evaluate Linguistic Abilities of Italian LLMs
Cristiano Ciaccio
|
Felice Dell’orletta
|
Alessio Miaschi
|
Giulia Venturi
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
State-of-the-art Large Language Models (LLMs) demonstrate exceptional proficiency across diverse tasks, yet systematic evaluations of their linguistic abilities remain limited. This paper addresses this gap by proposing a new evaluation framework leveraging the potentialities of Controllable Text Generation. Our approach evaluates the models’ capacity to generate sentences that adhere to specific linguistic constraints and their ability to recognize the linguistic properties of their own generated sentences, also in terms of consistency with the specified constraints. We tested our approach on six Italian LLMs using various linguistic constraints.