LongTutor: Benchmarking Large Language Models for Long-term Personalized Tutoring
Ning Li, Zheng Zhang, Zhenya Huang, Rui Li, Yi Zhan, Yinbo Luo, Qi Liu, Enhong Chen
Abstract
The rapid advancement of large language models (LLMs) has driven the deployment of LLM-based AI tutors on online learning platforms. This widespread adoption highlights an urgent need for systematic benchmarks to evaluate their tutoring capabilities. However, existing evaluations predominantly focus on isolated, short-term interactions, overlooking the inherently long-term nature of learning. To bridge this gap, we introduce LongTutor, a benchmark for long-term personalized tutoring grounded in formative assessment theory. Built from expert-annotated real-world learning logs, LongTutor evaluates LLMs across three progressive tasks: historical evidence acquisition, knowledge state diagnosis, and adaptive teaching action. Our experiments reveal a critical capability mismatch: while LLMs excel at evidence acquisition, they struggle to effectively leverage long-term history for accurate diagnosis and adaptive teaching. To enable scalable benchmark expansion, we further propose an automated generator–verifier pipeline, paving the way toward truly long-term AI tutoring systems.- Anthology ID:
- 2026.acl-long.1371
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 29712–29737
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1371/
- DOI:
- Cite (ACL):
- Ning Li, Zheng Zhang, Zhenya Huang, Rui Li, Yi Zhan, Yinbo Luo, Qi Liu, and Enhong Chen. 2026. LongTutor: Benchmarking Large Language Models for Long-term Personalized Tutoring. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 29712–29737, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- LongTutor: Benchmarking Large Language Models for Long-term Personalized Tutoring (Li et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1371.pdf