Co-training an Unsupervised Constituency Parser with Weak Supervision

Nickil Maveli, Shay Cohen


Abstract
We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F1 on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results.
Anthology ID:
2022.findings-acl.101
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1274–1291
Language:
URL:
https://aclanthology.org/2022.findings-acl.101
DOI:
10.18653/v1/2022.findings-acl.101
Bibkey:
Cite (ACL):
Nickil Maveli and Shay Cohen. 2022. Co-training an Unsupervised Constituency Parser with Weak Supervision. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1274–1291, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Co-training an Unsupervised Constituency Parser with Weak Supervision (Maveli & Cohen, Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.findings-acl.101.pdf
Video:
 https://preview.aclanthology.org/ingestion-script-update/2022.findings-acl.101.mp4
Code
 Nickil21/weakly-supervised-parsing
Data
Chinese TreebankPenn Treebank