Maximilian Kreutner
2026
QSTN: A Modular Framework for Robust Questionnaire Inference with Large Language Models
Maximilian Kreutner | Jens Rupprecht | Georg Ahnert | Ahmed Salem | Markus Strohmaier
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Maximilian Kreutner | Jens Rupprecht | Georg Ahnert | Ahmed Salem | Markus Strohmaier
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
We introduce QSTN, an open-source Python framework for systematically generating responses from questionnaire-style prompts to support in-silico surveys and annotation tasks with large language models (LLMs). QSTN enables robust evaluation of questionnaire presentation, prompt perturbations, and response generation methods. Our extensive evaluation (>40 million survey responses) shows that question structure and response generation methods have a significant impact on the alignment of generated survey responses with human answers. We also find that answers can be obtained for a fraction of the compute cost, by changing the presentation method. In addition, we offer a no-code user interface that allows researchers to set up robust experiments with LLMs without coding knowledge. We hope that QSTN will support the reproducibility and reliability of LLM-based research in the future.
Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models
Maximilian Kreutner | Marlene Lutz | Markus Strohmaier
Findings of the Association for Computational Linguistics: EACL 2026
Maximilian Kreutner | Marlene Lutz | Markus Strohmaier
Findings of the Association for Computational Linguistics: EACL 2026
Large Language Models (LLMs) display remarkable capabilities to understand or even produce political discourse but have been found to consistently exhibit a progressive left-leaning bias. At the same time, so-called persona or identity prompts have been shown to produce LLM behavior that aligns with socioeconomic groups with which the base model is not aligned. In this work, we analyze whether zero-shot persona prompting with limited information can accurately predict individual voting decisions and, by aggregation, accurately predict the positions of European groups on a diverse set of policies.We evaluate whether predictions are stable in response to counterfactual arguments, different persona prompts, and generation methods. Finally, we find that we can simulate the voting behavior of Members of the European Parliament reasonably well, achieving a weighted F1 score of approximately 0.793. Our persona dataset of politicians in the 2024 European Parliament and our code are available at the following url: https://github.com/dess-mannheim/european_parliament_simulation.
2025
Tracing Definitions: Lessons from Alliance Contracts in the Biopharmaceutical Industry
Maximilian Kreutner | Doerte Leusmann | Florian Lemmerich | Carolin Haeussler
Proceedings of the Natural Legal Language Processing Workshop 2025
Maximilian Kreutner | Doerte Leusmann | Florian Lemmerich | Carolin Haeussler
Proceedings of the Natural Legal Language Processing Workshop 2025
Definitions in alliance contracts play a critical role in shaping agreements, yet they can also lead to costly misunderstandings. This is exemplified by the multimillion-dollar AstraZeneca-Euopean Commission (EC) dispute, where the interpretation of ‘best reasonable effort’ became the focal point of contention. In this interdisciplinary study, we leverage natural language processing (NLP) to systematically analyze patterns in the definitions included in alliance contracts. More specifically, we categorize the content of definitions into topics, identify common terms versus outliers that are semantically dissimilar and infrequently used, and track how definitions evolve over time. Analyzing a dataset of 380,131 definitions from 12,468 alliance contracts in the biopharmaceutical industry, we distinguish that definitions span legal, technological, and social topics, with social terms showing the highest dissimilarity across contracts. Using dynamic topic modeling, we explore how the content of definitions has shifted over two decades (2000–2020) and identify prevalent trends suggesting that contractual definitions reflect broader economic contexts. Notably, our results reveal that the AstraZeneca-EC dispute arose from an outlier—a highly unusual definition—that could have been flagged using NLP. Overall, these findings highlight the potential of data-driven approaches to uncover patterns in alliance contracts.