TravelBehaviorQA: A Benchmark Dataset for Behavioral Interpretation of GPS Trajectories

Dongyang Zhen, Niping Duan, Huan Zhou, Qingbin Cui


Abstract
GPS trajectories encode rich behavioral information about how people move, organize activities, and form daily routines. Recent advances in large language models (LLMs) raise a natural question: can such models infer and summarize travel behavior directly from mobility traces? This paper introduces TravelBehaviorQA, a large-scale benchmark dataset that reframes trajectory analysis as a language-based behavioral understanding task. The dataset links raw GPS trajectories with human-grounded question-answering (QA) pairs that capture travel intensity, temporal structure, activity patterns, mode usage, and behavioral routines. Unlike prior mobility datasets focused on prediction or classification, TravelBehaviorQA emphasizes semantic interpretation through a unified mix of deterministic and open-ended questions. In this benchmark, we construct over 143k QA instances spanning users and years, and evaluate a broad range of state-of-the-art LLMs under controlled settings. Our results reveal substantial gaps between factual extraction and genuine behavioral reasoning, showing that model scale alone is insufficient and that trajectory representation is a primary bottleneck. TravelBehaviorQA exposes critical limitations of current models and establishes a rigorous benchmark for advancing language-based understanding of human mobility behavior.
Anthology ID:
2026.findings-acl.1604
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32053–32071
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1604/
DOI:
Bibkey:
Cite (ACL):
Dongyang Zhen, Niping Duan, Huan Zhou, and Qingbin Cui. 2026. TravelBehaviorQA: A Benchmark Dataset for Behavioral Interpretation of GPS Trajectories. In Findings of the Association for Computational Linguistics: ACL 2026, pages 32053–32071, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
TravelBehaviorQA: A Benchmark Dataset for Behavioral Interpretation of GPS Trajectories (Zhen et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1604.pdf
Checklist:
 2026.findings-acl.1604.checklist.pdf