Challenging the “One Single Vector per Token” Assumption

Mathieu Dehouck


Abstract
In this paper we question the almost universal assumption that in neural networks each token should be represented by a single vector. In fact, it is so natural to use one vector per word that most people do not even consider it as an assumption of their various models. Via a series of experiments on dependency parsing, in which we let each token in a sentence be represented by a sequence of vectors, we show that the “one single vector per token” assumption might be too strong for recurrent neural networks. Indeed, biaffine parsers seem to work better when their encoder accesses its input’s tokens’ representations in several time steps rather than all at once. This seems to indicate that having only one occasion to look at a token through its vector is too strong a constraint for recurrent neural networks and calls for further studies on the way tokens are fed to neural networks.
Anthology ID:
2023.conll-1.33
Volume:
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)
Month:
December
Year:
2023
Address:
Singapore
Editors:
Jing Jiang, David Reitter, Shumin Deng
Venue:
CoNLL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
498–507
Language:
URL:
https://aclanthology.org/2023.conll-1.33
DOI:
10.18653/v1/2023.conll-1.33
Bibkey:
Cite (ACL):
Mathieu Dehouck. 2023. Challenging the “One Single Vector per Token” Assumption. In Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL), pages 498–507, Singapore. Association for Computational Linguistics.
Cite (Informal):
Challenging the “One Single Vector per Token” Assumption (Dehouck, CoNLL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2023.conll-1.33.pdf