Representation of Lexical Stylistic Features in Language Models’ Embedding Space

Qing Lyu; Marianna Apidianaki; Chris Callison-Burch

doi:10.18653/v1/2023.starsem-1.32

Representation of Lexical Stylistic Features in Language Models’ Embedding Space

Qing Lyu, Marianna Apidianaki, Chris Callison-burch

Abstract

The representation space of pretrained Language Models (LMs) encodes rich information about words and their relationships (e.g., similarity, hypernymy, polysemy) as well as abstract semantic notions (e.g., intensity). In this paper, we demonstrate that lexical stylistic notions such as complexity, formality, and figurativeness, can also be identified in this space. We show that it is possible to derive a vector representation for each of these stylistic notions from only a small number of seed pairs. Using these vectors, we can characterize new texts in terms of these dimensions by performing simple calculations in the corresponding embedding space. We conduct experiments on five datasets and find that static embeddings encode these features more accurately at the level of words and phrases, whereas contextualized LMs perform better on sentences. The lower performance of contextualized representations at the word level is partially attributable to the anisotropy of their vector space, which can be corrected to some extent using techniques like standardization.

Anthology ID:: 2023.starsem-1.32
Volume:: Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Alexis Palmer, Jose Camacho-collados
Venue:: *SEM
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 370–387
Language:
URL:: https://aclanthology.org/2023.starsem-1.32
DOI:: 10.18653/v1/2023.starsem-1.32
Bibkey:
Cite (ACL):: Qing Lyu, Marianna Apidianaki, and Chris Callison-burch. 2023. Representation of Lexical Stylistic Features in Language Models’ Embedding Space. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023), pages 370–387, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Representation of Lexical Stylistic Features in Language Models’ Embedding Space (Lyu et al., *SEM 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/2023.starsem-1.32.pdf
Software:: 2023.starsem-1.32.software.zip

PDF Search Software