GiCCS: A German in-Context Conversational Similarity Benchmark
Shima Asaadi, Zahra Kolagar, Alina Liebel, Alessandra Zarcone
Abstract
The Semantic textual similarity (STS) task is commonly used to evaluate the semantic representations that language models (LMs) learn from texts, under the assumption that good-quality representations will yield accurate similarity estimates. When it comes to estimating the similarity of two utterances in a dialogue, however, the conversational context plays a particularly important role. We argue for the need of benchmarks specifically created using conversational data in order to evaluate conversational LMs in the STS task. We introduce GiCCS, a first conversational STS evaluation benchmark for German. We collected the similarity annotations for GiCCS using best-worst scaling and presenting the target items in context, in order to obtain highly-reliable context-dependent similarity scores. We present benchmarking experiments for evaluating LMs on capturing the similarity of utterances. Results suggest that pretraining LMs on conversational data and providing conversational context can be useful for capturing similarity of utterances in dialogues. GiCCS will be publicly available to encourage benchmarking of conversational LMs.- Anthology ID:
- 2022.gem-1.30
- Volume:
- Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Editors:
- Antoine Bosselut, Khyathi Chandu, Kaustubh Dhole, Varun Gangal, Sebastian Gehrmann, Yacine Jernite, Jekaterina Novikova, Laura Perez-Beltrachini
- Venue:
- GEM
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 351–362
- Language:
- URL:
- https://aclanthology.org/2022.gem-1.30
- DOI:
- 10.18653/v1/2022.gem-1.30
- Cite (ACL):
- Shima Asaadi, Zahra Kolagar, Alina Liebel, and Alessandra Zarcone. 2022. GiCCS: A German in-Context Conversational Similarity Benchmark. In Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 351–362, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- GiCCS: A German in-Context Conversational Similarity Benchmark (Asaadi et al., GEM 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2022.gem-1.30.pdf