Generating Complex Question Decompositions in the Face of Distribution Shifts

Kelvin Han, Claire Gardent


Abstract
Question decomposition has been found to help large language models’ (LLMs) performance on complex question answering (QA) by breaking these questions into simpler sub-questions for answering. Nonetheless, performance on the task remains dominated by supervised approaches, suggesting room for making LLMs better decomposers. One way of improving LLM training and fine-tuning is to leverage synthetic training data, but the superior performance of supervised approaches collapses in the face of distribution shifts, making them unsuitable for generating synthetic data across new domains and at scale. To address this, we propose an approach to generate synthetic decomposition data with only five annotated examples; we do this by (i) extending recent advancements in using LLM-as-judge and for reranking in novel ways, as well as (ii) using a panel of smaller-sized LLMs for data generation instead of resource-intensive larger models. Through careful validation of our approach over two benchmark datasets, we show that our data generation and modelling approaches bring consistent improvements over using few-shot prompting with LLMs for the task. Our code and models can be found at https://github.com/hankelvin/complex_question_decomposition.
Anthology ID:
2025.naacl-long.55
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1189–1211
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.naacl-long.55/
DOI:
Bibkey:
Cite (ACL):
Kelvin Han and Claire Gardent. 2025. Generating Complex Question Decompositions in the Face of Distribution Shifts. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1189–1211, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Generating Complex Question Decompositions in the Face of Distribution Shifts (Han & Gardent, NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.naacl-long.55.pdf