Abstract
Language models (LMs) can express factual knowledge involving numeric properties such as Karl Popper was born in 1902. However, how this information is encoded in the model’s internal representations is not understood well. Here, we introduce a method for finding and editing representations of numeric properties such as an entity’s birth year. We find directions that encode numeric properties monotonically, in an interpretable fashion. When editing representations along these directions, LM output changes accordingly. For example, by patching activations along a “birthyear” direction we can make the LM express an increasingly late birthyear. Property-encoding directions exist across several numeric properties in all models under consideration, suggesting the possibility that monotonic representation of numeric properties consistently emerges during LM pretraining.Code: https://github.com/bheinzerling/numeric-property-reprA long version of this short paper is available at: https://arxiv.org/abs/2403.10381- Anthology ID:
- 2024.acl-short.18
- Volume:
- Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 175–195
- Language:
- URL:
- https://aclanthology.org/2024.acl-short.18
- DOI:
- 10.18653/v1/2024.acl-short.18
- Cite (ACL):
- Benjamin Heinzerling and Kentaro Inui. 2024. Monotonic Representation of Numeric Attributes in Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 175–195, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Monotonic Representation of Numeric Attributes in Language Models (Heinzerling & Inui, ACL 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.acl-short.18.pdf