Ahmed Rayane Kebir


2026

Cross-task generalization mimics human intelligence through the ability to perform tasks by recalling foundational skills acquired previously. In this paper, we argue that argument generation and argument retrieval are complex tasks that could leverage cross-tasking atomic argument mining and argument quality assessment tasks, even if there is no supervision. We empirically demonstrate the rationale behind our claim through the ArgLLM framework, including a total of 18.9K instruction data using a multi-choice question-answering format, scaling up through multi-tasking and model merging, six natural language argumentation atomic tasks to four complex argument generation and argument retrieval tasks. Our results and analysis, using the backbone Mistral and Llama models, show that cross-tasking in zero-shot settings outperforms base models and is robust to varying strategies, tasks, and model sizes, offering a valuable trade-off between computational cost and task performance.