MC2: A Minimum-Coverage and Dataset-Agnostic Framework for Compositional Generalization of LLMs on Semantic Parsing

Ziyao Xu, Zhe Yang, Houfeng Wang


Abstract
Compositional generalization is one of the important abilities that large language models (LLMs) need to have for semantic parsing. Previous research typically relies on dataset-specific designs or a large number of samples in demonstrations to improve the compositional generalization of LLMs on semantic parsing. We revisit this issue and find that when the number of samples in a demonstration is limited to a theoretical lower bound for achieving compositional generalization (minimum-coverage), current advanced LLMs cannot arbitrarily achieve good compositional generalization generically on different semantic parsing datasets without dataset-specific designs. To solve this problem, we propose Multi-level Component Composition (MC^2), a minimum-coverage and dataset-agnostic framework based on input primitives, which aims to generically help LLMs achieve compositional generalization by selecting and organizing samples from multiple compositional levels that satisfy the primitive coverage. Experiments and analysis show that MC^2 can effectively improve the compositional generalization of LLMs on different semantic parsing datasets in the minimum-coverage setting.
Anthology ID:
2025.findings-emnlp.406
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7694–7706
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.406/
DOI:
10.18653/v1/2025.findings-emnlp.406
Bibkey:
Cite (ACL):
Ziyao Xu, Zhe Yang, and Houfeng Wang. 2025. MC2: A Minimum-Coverage and Dataset-Agnostic Framework for Compositional Generalization of LLMs on Semantic Parsing. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 7694–7706, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
MC2: A Minimum-Coverage and Dataset-Agnostic Framework for Compositional Generalization of LLMs on Semantic Parsing (Xu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.406.pdf
Checklist:
 2025.findings-emnlp.406.checklist.pdf