Frieso Turkstra

2025

pdf bib abs
TriLLaMa at CQs-Gen 2025: A Two-Stage LLM-Based System for Critical Question Generation
Frieso Turkstra | Sara Nabhani | Khalid Al-Khatib
Proceedings of the 12th Argument mining Workshop

This paper presents a new system for generating critical questions in debates, developed for the Critical Questions Generation shared task. Our two-stage approach, combining generation and classification, utilizes LLaMA 3.1 Instruct models (8B, 70B, 405B) with zero-/few-shot prompting. Evaluations on annotated debate data reveal several key insights: few-shot generation with 405B yielded relatively high-quality questions, achieving a maximum possible punctuation score of 73.5. The 70B model outperformed both smaller and larger variants on the classification part. The classifiers showed a strong bias toward labeling generated questions as Useful, despite limited validation. Further, our system, ranked 6 extsuperscriptth, out-performed baselines by 3%. These findings stress the effectiveness of large-sized models for question generation and medium-sized models for classification, and suggest the need for clearer task definitions within prompts to improve classification accuracy.

Co-authors

Khalid Al Khatib 1
Sara Nabhani 1

Venues

argmining1
ws1

Fix author