Abstract
A recent renewal in interest in long text understanding has sparked the emergence of high-quality long text benchmarks, as well as new models demonstrating significant performance improvements on these benchmarks. However, gauging the implication of these advancements based solely on the length of the input text offers limited insight. Such benchmarks may require models to parse long-range dependencies or merely to locate and comprehend the relevant paragraph within a longer text. This work introduces the Minimal Viable Phrase (MVP), a novel metric that determines, through perturbations to the input text, the shortest average text length that needs to be preserved to execute the task with limited performance degradation. Our evaluation of the popular SCROLLS benchmark reveals that only one of its seven tasks necessitates an MVP of over 512 tokens–the maximum text length manageable by the previous generation of pre-trained models. We highlight the limited need for understanding long-range dependencies in resolving these tasks, discuss the specific design decisions that seem to have led to the QuALITY task requiring reliance on long-range dependencies to be solved, and point out specific modeling choices that seem to outperform on the QuALITY task.- Anthology ID:
- 2024.lrec-main.1049
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 12016–12026
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1049
- DOI:
- Cite (ACL):
- Louis Clouatre, Amal Zouaq, and Sarath Chandar. 2024. MVP: Minimal Viable Phrase for Long Text Understanding. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 12016–12026, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- MVP: Minimal Viable Phrase for Long Text Understanding (Clouatre et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.1049.pdf