Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation

Haoyi Wu, Kewei Tu


Abstract
Syntactic structures used to play a vital role in natural language processing (NLP), but since the deep learning revolution, NLP has been gradually dominated by neural models that do not consider syntactic structures in their design. One vastly successful class of neural models is transformers. When used as an encoder, a transformer produces contextual representation of words in the input sentence. In this work, we propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. Specifically, we design a conditional random field that models discrete latent representations of all words in a sentence as well as dependency arcs between them; and we use mean field variational inference for approximate inference. Strikingly, we find that the computation graph of our model resembles transformers, with correspondences between dependencies and self-attention and between distributions over latent representations and contextual embeddings of words. Experiments show that our model performs competitively to transformers on small to medium sized datasets. We hope that our work could help bridge the gap between traditional syntactic and probabilistic approaches and cutting-edge neural approaches to NLP, and inspire more linguistically-principled neural approaches in the future.
Anthology ID:
2023.findings-acl.482
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7613–7636
Language:
URL:
https://aclanthology.org/2023.findings-acl.482
DOI:
10.18653/v1/2023.findings-acl.482
Bibkey:
Cite (ACL):
Haoyi Wu and Kewei Tu. 2023. Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7613–7636, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation (Wu & Tu, Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2023.findings-acl.482.pdf
Video:
 https://preview.aclanthology.org/landing_page/2023.findings-acl.482.mp4