Mateusz Piotrowski

2025

pdf bib abs
Circuit-Tracer: A New Library for Finding Feature Circuits
Michael Hanna | Mateusz Piotrowski | Jack Lindsey | Emmanuel Ameisen
Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Feature circuits aim to shed light on LLM behavior by identifying the features that are causally responsible for a given LLM output, and connecting them into a directed graph, or *circuit*, that explains how both each feature and each output arose. However, performing circuit analysis is challenging: the tools for finding, visualizing, and verifying feature circuits are complex and spread across multiple libraries.To facilitate feature-circuit finding, we introduce ‘circuit-tracer‘, an open-source library for efficient identification of feature circuits. ‘circuit-tracer‘ provides an integrated pipeline for finding, visualizing, annotating, and performing interventions on such feature circuits, tested with various model sizes, up to 14B parameters. We make ‘circuit-tracer‘ available to both developers and end users, via integration with tools such as Neuronpedia, which provides a user-friendly interface.

Co-authors

Venues

blackboxnlp1
ws1

Fix data