Non Verbis, Sed Rebus: Large Language Models Are Weak Solvers of Italian Rebuses
Gabriele Sarti, Tommaso Caselli, Malvina Nissim, Arianna Bisazza
Abstract
Rebuses are puzzles requiring constrained multi-step reasoning to identify a hidden phrase from a set of images and letters. In this work, we introduce a large collection of verbalized rebuses for the Italian language and use it to assess the rebus-solving capabilities of state-of-the-art large language models. While general-purpose systems such as LLaMA-3 and GPT-4o perform poorly on this task, ad-hoc fine-tuning seems to improve models’ performance. However, we find that performance gains from training are largely motivated by memorization. Our results suggest that rebus solving remains a challenging test bed to evaluate large language models’ linguistic proficiency and sequential instruction-following skills.- Anthology ID:
- 2024.clicit-1.96
- Volume:
- Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
- Month:
- December
- Year:
- 2024
- Address:
- Pisa, Italy
- Editors:
- Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
- Venue:
- CLiC-it
- SIG:
- Publisher:
- CEUR Workshop Proceedings
- Note:
- Pages:
- 888–897
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.clicit-1.96/
- DOI:
- Cite (ACL):
- Gabriele Sarti, Tommaso Caselli, Malvina Nissim, and Arianna Bisazza. 2024. Non Verbis, Sed Rebus: Large Language Models Are Weak Solvers of Italian Rebuses. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 888–897, Pisa, Italy. CEUR Workshop Proceedings.
- Cite (Informal):
- Non Verbis, Sed Rebus: Large Language Models Are Weak Solvers of Italian Rebuses (Sarti et al., CLiC-it 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.clicit-1.96.pdf