Explaining Explanations: Interpretability Methods for Discourse Analysis of Transformer Attention Maps

Louis Escouflaire, Jérémie Bogaert, Antonin Descampe, Cédrick Fairon, Francois-Xavier Standaert


Abstract
While LLMs have achieved state-of-the-art performance in NLP, their opacity hinders a human understanding of their predictions. Standard explainability techniques often prioritize technical faithfulness over linguistic plausibility. This paper argues for an interdisciplinary approach that integrates discourse analysis to critically interpret model explanations. We conduct a case study using CamemBERT, fine-tuned to classify French journalistic texts as news or opinion. We employ Layer-wise Relevance Propagation to generate attention maps for 1,000 test articles and analyze the token-level relevance scores through both in-depth qualitative analysis and a quantitative ranking of high-attention tokens. Our findings reveal that CamemBERT successfully captures genre-specific linguistic markers: it attends to cues of reported speech and temporal anchors in news, and to expressive punctuation, evaluative adjectives, and first-person pronouns in opinion. The discourse-analytic lens moves us beyond superficial observations, demonstrating how the model interprets features like punctuation as structural or stylistic conventions. We argue that integrating linguistic expertise into the explainability pipeline yields more nuanced, human-readable explanations.
Anthology ID:
2026.lrec-main.8
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
107–116
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.8/
DOI:
Bibkey:
Cite (ACL):
Louis Escouflaire, Jérémie Bogaert, Antonin Descampe, Cédrick Fairon, and Francois-Xavier Standaert. 2026. Explaining Explanations: Interpretability Methods for Discourse Analysis of Transformer Attention Maps. International Conference on Language Resources and Evaluation, main:107–116.
Cite (Informal):
Explaining Explanations: Interpretability Methods for Discourse Analysis of Transformer Attention Maps (Escouflaire et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.8.pdf