When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity

Nisrine Rair, Alban Goupil, Valeriu Vrabie, Emmanuel Chochoy


Abstract
Language models are often evaluated with scalar metrics like accuracy, but such measures fail to capture how models internally represent ambiguity, especially when human annotators disagree. We propose a topological perspective to analyze how fine-tuned models encode ambiguity and more generally instances.Applied to RoBERTa-Large on the MD-Offense dataset, Mapper, a tool from topological data analysis, reveals that fine-tuning restructures embedding space into modular, non-convex regions aligned with model predictions, even for highly ambiguous cases. Over 98% of connected components exhibit ≥ 90% prediction purity, yet alignment with ground-truth labels drops in ambiguous data, surfacing a hidden tension between structural confidence and label uncertainty.Unlike traditional tool such as PCA or UMAP, Mapper captures this geometry directly uncovering decision regions, boundary collapses, and overconfident clusters. Our findings position Mapper as a powerful diagnostic tool for understanding how models resolve ambiguity. Beyond visualization, it also enables topological metrics that may inform proactive modeling strategies in subjective NLP tasks.
Anthology ID:
2025.emnlp-main.426
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8468–8491
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.426/
DOI:
10.18653/v1/2025.emnlp-main.426
Bibkey:
Cite (ACL):
Nisrine Rair, Alban Goupil, Valeriu Vrabie, and Emmanuel Chochoy. 2025. When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 8468–8491, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity (Rair et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.426.pdf
Checklist:
 2025.emnlp-main.426.checklist.pdf