Probability Distribution Collapse: A Critical Bottleneck to Compact Unsupervised Neural Grammar Induction

Jinwook Park, Kangil Kim


Abstract
Unsupervised neural grammar induction aims to learn interpretable hierarchical structures from language data. However, existing models face an expressiveness bottleneck, often resulting in unnecessarily large yet underperforming grammars. We identify a core issue, *probability distribution collapse*, as the underlying cause of this limitation. We analyze when and how the collapse emerges across key components of neural parameterization and introduce a targeted solution, *collapse-relaxing neural parameterization*, to mitigate it. Our approach substantially improves parsing performance while enabling the use of significantly more compact grammars across a wide range of languages, as demonstrated through extensive empirical analysis.
Anthology ID:
2025.emnlp-main.1694
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
33380–33391
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1694/
DOI:
Bibkey:
Cite (ACL):
Jinwook Park and Kangil Kim. 2025. Probability Distribution Collapse: A Critical Bottleneck to Compact Unsupervised Neural Grammar Induction. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33380–33391, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Probability Distribution Collapse: A Critical Bottleneck to Compact Unsupervised Neural Grammar Induction (Park & Kim, EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1694.pdf
Checklist:
 2025.emnlp-main.1694.checklist.pdf