@inproceedings{benito-santos-ghajari-2025-beyond,
    title = "Beyond Averages: Learning with Annotator Disagreement in {STS}",
    author = "Benito-Santos, Alejandro  and
      Ghajari, Adrian",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1800/",
    pages = "35539--35545",
    ISBN = "979-8-89176-332-6",
    abstract = "This work investigates capturing and modeling disagreement in Semantic Textual Similarity (STS), where sentence pairs are assigned ordinal similarity labels (0{--}5). Conventional STS systems average multiple annotator scores and focus on a single numeric estimate, overlooking label dispersion. By leveraging the disaggregated SemEval-2015 dataset (Soft-STS-15), this paper proposes and compares two disagreement-aware strategies that treat STS as an ordinal distribution prediction problem: a lightweight truncated Gaussian head for standard regression models, and a cross-encoder trained with a distance-aware objective, refined with temperature scaling. Results show improved performance in distance-based metrics, with the calibrated soft-label model proving best overall and notably more accurate on the most ambiguous pairs. This demonstrates that modeling disagreement benefits both calibration and ranking accuracy, highlighting the value of retaining and modeling full annotation distributions rather than collapsing them to a single mean label."
}Markdown (Informal)
[Beyond Averages: Learning with Annotator Disagreement in STS](https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1800/) (Benito-Santos & Ghajari, EMNLP 2025)
ACL