@inproceedings{saphra-wiegreffe-2024-mechanistic,
title = "Mechanistic?",
author = "Saphra, Naomi and
Wiegreffe, Sarah",
editor = "Belinkov, Yonatan and
Kim, Najoung and
Jumelet, Jaap and
Mohebbi, Hosein and
Mueller, Aaron and
Chen, Hanjie",
booktitle = "Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP",
month = nov,
year = "2024",
address = "Miami, Florida, US",
publisher = "Association for Computational Linguistics",
url = "https://preview.aclanthology.org/add-emnlp-2024-awards/2024.blackboxnlp-1.30/",
doi = "10.18653/v1/2024.blackboxnlp-1.30",
pages = "480--498",
abstract = "The rise of the term {\textquotedblleft}mechanistic interpretability{\textquotedblright} has accompanied increasing interest in understanding neural models{---}particularly language models. However, this jargon has also led to a fair amount of confusion. So, what does it mean to be mechanistic? We describe four uses of the term in interpretability research. The most narrow technical definition requires a claim of causality, while a broader technical definition allows for any exploration of a model`s internals. However, the term also has a narrow cultural definition describing a cultural movement. To understand this semantic drift, we present a history of the NLP interpretability community and the formation of the separate, parallel mechanistic interpretability community. Finally, we discuss the broad cultural definition{---}encompassing the entire field of interpretability{---}and why the traditional NLP interpretability community has come to embrace it. We argue that the polysemy of {\textquotedblleft}mechanistic{\textquotedblright} is the product of a critical divide within the interpretability community."
}
Markdown (Informal)
[Mechanistic?](https://preview.aclanthology.org/add-emnlp-2024-awards/2024.blackboxnlp-1.30/) (Saphra & Wiegreffe, BlackboxNLP 2024)
ACL
- Naomi Saphra and Sarah Wiegreffe. 2024. Mechanistic?. In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 480–498, Miami, Florida, US. Association for Computational Linguistics.