Unit Testing for Concepts in Neural Networks

Charles Lovering, Ellie Pavlick


Abstract
Many complex problems are naturally understood in terms of symbolic concepts. For example, our concept of “cat” is related to our concepts of “ears” and “whiskers” in a non-arbitrary way. Fodor (1998) proposes one theory of concepts, which emphasizes symbolic representations related via constituency structures. Whether neural networks are consistent with such a theory is open for debate. We propose unit tests for evaluating whether a system’s behavior is consistent with several key aspects of Fodor’s criteria. Using a simple visual concept learning task, we evaluate several modern neural architectures against this specification. We find that models succeed on tests of groundedness, modularity, and reusability of concepts, but that important questions about causality remain open. Resolving these will require new methods for analyzing models’ internal states.
Anthology ID:
2022.tacl-1.69
Volume:
Transactions of the Association for Computational Linguistics, Volume 10
Month:
Year:
2022
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
1193–1208
Language:
URL:
https://aclanthology.org/2022.tacl-1.69
DOI:
10.1162/tacl_a_00514
Bibkey:
Cite (ACL):
Charles Lovering and Ellie Pavlick. 2022. Unit Testing for Concepts in Neural Networks. Transactions of the Association for Computational Linguistics, 10:1193–1208.
Cite (Informal):
Unit Testing for Concepts in Neural Networks (Lovering & Pavlick, TACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.tacl-1.69.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2022.tacl-1.69.mp4