Søren Fomsgaard

Also published as: Søren Kirkegaard Fomsgaard

2026

Par-ITA: Benchmarking Seq2Seq and LLMs on a Human-Supervised Parallel Corpus for Italian Hyperpartisan Neutralization
Michele Joshua Maggini | Søren Fomsgaard | Michele Maestroni | Gaël Dias | Pablo Gamallo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neutralizing hyperpartisan content is essential for mitigating online polarization, yet research has largely focused on English. We present Par-ITA, a curated subset from Semeval 2023 task 3, consisting in the first human-supervised parallel corpus for Italian hyperpartisan neutralization of 2,475 paragraph pairs. The dataset is constructed using a rigorous three-stage pipeline: (1) expert-led preliminary selection of LLMs for high-quality generation, (2) human-supervised data production with high editing rates (32–68%), and (3) post-hoc human validation. We establish extensive benchmarks for this task across seq2seq and decoder-only architectures, evaluating standard fine-tuning, Direct Preference Optimization (DPO), and in-context learning. Our analysis highlights that while DPO effectively maximizes neutrality scores in seq2seq models, automated evaluators like GPT-4o-mini exhibit systematic biases, specifically over-penalizing sensitive political topics compared to human experts. Par-ITA provides a foundational resource for non-English neutralization and a reproducible framework for developing high-quality datasets in subjective domains.

pdf bib abs

Discourse Realization of Generics in Human and LLM-generated Texts
Søren Kirkegaard Fomsgaard | Martial Pastor | Gaël Dias | Nelleke Oostdijk
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) often produce texts that appear coherent and credible, even when their factual reliability is uncertain. This paper investigates whether such perceived credibility correlates with the pervasive use of generics—generalizations without explicit quantification. We introduce a text-level genericity score derived from clause-level annotations and apply it to argumentative essays produced by humans and LLMs. To analyze how generics are realized in discourse, we employ Rhetorical Structure Theory to examine coherence relations across varying levels of genericity. Results show that according to our genericity metric, human texts are less generic than LLM-produced texts. As regards discourse, higher genericity correlates with less structured, paratactic structures, while for some models coherence is maintained through elaboration relations. Our findings suggest that some LLMs maintain well-structured coherence even in highly generic texts, which might enable them to "camouflage" argumentative texts as informative, enhancing their perceived credibility and persuasiveness.

Co-authors

Martial Pastor 1

Venues

ACL2

Fix author