Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers

Jiaxi Li; Wei Lu

doi:10.18653/v1/2023.acl-long.285

Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers

Abstract

Recent advancements in pre-trained language models (PLMs) have demonstrated that these models possess some degree of syntactic awareness. To leverage this knowledge, we propose a novel chart-based method for extracting parse trees from masked language models (LMs) without the need to train separate parsers. Our method computes a score for each span based on the distortion of contextual representations resulting from linguistic perturbations. We design a set of perturbations motivated by the linguistic concept of constituency tests, and use these to score each span by aggregating the distortion scores. To produce a parse tree, we use chart parsing to find the tree with the minimum score. Our method consistently outperforms previous state-of-the-art methods on English with masked LMs, and also demonstrates superior performance in a multilingual setting, outperforming the state-of-the-art in 6 out of 8 languages. Notably, although our method does not involve parameter updates or extensive hyperparameter search, its performance can even surpass some unsupervised parsing methods that require fine-tuning. Our analysis highlights that the distortion of contextual representation resulting from syntactic perturbation can serve as an effective indicator of constituency across languages.

Anthology ID:: 2023.acl-long.285
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5208–5222
Language:
URL:: https://aclanthology.org/2023.acl-long.285
DOI:: 10.18653/v1/2023.acl-long.285
Bibkey:
Cite (ACL):: Jiaxi Li and Wei Lu. 2023. Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5208–5222, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers (Li & Lu, ACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/emnlp-22-attachments/2023.acl-long.285.pdf
Video:: https://preview.aclanthology.org/emnlp-22-attachments/2023.acl-long.285.mp4

PDF Search Video