Trick or Neat: Adversarial Ambiguity and Language Model Evaluation
Antonia Karamolegkou, Oliver Eberle, Phillip Rust, Carina Kauf, Anders Søgaard
Abstract
Detecting ambiguity is important for language understanding, including uncertainty estimation, humour detection, and processing garden path sentences. We assess language models’ sensitivity to ambiguity by introducing an adversarial ambiguity dataset that includes syntactic, lexical, and phonological ambiguities along with adversarial variations (e.g., word-order changes, synonym replacements, and random-based alterations). Our findings show that direct prompting fails to robustly identify ambiguity, while linear probes trained on model representations can decode ambiguity with high accuracy, sometimes exceeding 90%. Our results offer insights into the prompting paradigm and how language models encode ambiguity at different layers.- Anthology ID:
- 2025.findings-acl.954
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 18542–18561
- Language:
- URL:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.954/
- DOI:
- Cite (ACL):
- Antonia Karamolegkou, Oliver Eberle, Phillip Rust, Carina Kauf, and Anders Søgaard. 2025. Trick or Neat: Adversarial Ambiguity and Language Model Evaluation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 18542–18561, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Trick or Neat: Adversarial Ambiguity and Language Model Evaluation (Karamolegkou et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.954.pdf