ReTRE: Benchmarking LLM Transfer Robustness with Structure-Preserving Variants

ZhongDong Li, Weijie Shi, Yue Cui, Haolun MA, Yuanjun Liu, Jiawei Li, An Liu, Jia Zhu, Jiajie Xu


Abstract
Large language models (LLMs) have achieved strong performance on standard benchmarks, yet their performance is not robust across different task manifestations. It remains unclear how performance changes under controlled task rewrites that preserve the original solution structure, while varying the rewrite type and level. To address this question, we introduce ReTRE (Rewrite-based Transfer Robustness Evaluation), an evaluation benchmark inspired by learning transfer theory that probes transfer robustness along two rewrite levels: Near Transfer and Far Transfer. ReTRE employs a multi-agent system to construct textual and visual variants while preserving the structure of the original solution. Evaluations on mathematical and science tasks across state-of-the-art multimodal LLMs reveal a consistent transfer gap: performance exhibits a general declining trend as transfer similarity drops and strong text performance can face performance decline under cross-modal transfer. Crucially, we identify a divergence between post-training paradigms: reinforcement learning preserves transfer robustness, whereas supervised fine-tuning tends to overfit the training distribution, leading to severe degradation in far-transfer performance despite strong in-distribution accuracy.
Anthology ID:
2026.acl-long.2048
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44257–44268
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2048/
DOI:
Bibkey:
Cite (ACL):
ZhongDong Li, Weijie Shi, Yue Cui, Haolun MA, Yuanjun Liu, Jiawei Li, An Liu, Jia Zhu, and Jiajie Xu. 2026. ReTRE: Benchmarking LLM Transfer Robustness with Structure-Preserving Variants. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 44257–44268, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
ReTRE: Benchmarking LLM Transfer Robustness with Structure-Preserving Variants (Li et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2048.pdf
Checklist:
 2026.acl-long.2048.checklist.pdf