Probing for Constituency Structure in Neural Language Models

David Arps; Younes Samih; Laura Kallmeyer; Hassan Sajjad

doi:10.18653/v1/2022.findings-emnlp.502

Probing for Constituency Structure in Neural Language Models

David Arps, Younes Samih, Laura Kallmeyer, Hassan Sajjad

Abstract

In this paper, we investigate to which extent contextual neural language models (LMs) implicitly learn syntactic structure. More concretely, we focus on constituent structure as represented in the Penn Treebank (PTB). Using standard probing techniques based on diagnostic classifiers, we assess the accuracy of representing constituents of different categories within the neuron activations of a LM such as RoBERTa. In order to make sure that our probe focuses on syntactic knowledge and not on implicit semantic generalizations, we also experiment on a PTB version that is obtained by randomly replacing constituents with each other while keeping syntactic structure, i.e., a semantically ill-formed but syntactically well-formed version of the PTB. We find that 4 pretrained transfomer LMs obtain high performance on our probing tasks even on manipulated data, suggesting that semantic and syntactic knowledge in their representations can be separated and that constituency information is in fact learned by the LM. Moreover, we show that a complete constituency tree can be linearly separated from LM representations.

Anthology ID:: 2022.findings-emnlp.502
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6738–6757
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.502
DOI:: 10.18653/v1/2022.findings-emnlp.502
Bibkey:
Cite (ACL):: David Arps, Younes Samih, Laura Kallmeyer, and Hassan Sajjad. 2022. Probing for Constituency Structure in Neural Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6738–6757, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Probing for Constituency Structure in Neural Language Models (Arps et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl24-info/2022.findings-emnlp.502.pdf
Software:: 2022.findings-emnlp.502.software.zip
Video:: https://preview.aclanthology.org/naacl24-info/2022.findings-emnlp.502.mp4

PDF Search Software Video