How LLMs React to Industrial Spatio-Temporal Data? Assessing Hallucination with a Novel Traffic Incident Benchmark Dataset
Qiang Li, Mingkun Tan, Xun Zhao, Dan Zhang, Daoan Zhang, Shengzhao Lei, Anderson S. Chu, Lujun Li, Porawit Kamnoedboon
Abstract
Large language models (LLMs) hold revolutionary potential to digitize and enhance the Health & Public Services (H&PS) industry. Despite their advanced linguistic abilities, concerns about accuracy, stability, and traceability still persist, especially in high-stakes areas such as transportation systems. Moreover, the predominance of English in LLM development raises questions about how they perform in non-English contexts. This study originated from a real world industrial GenAI application, introduces a novel cross-lingual benchmark dataset comprising nearly 99,869 real traffic incident records from Vienna (2013-2023) to assess the robustness of state-of-the-art LLMs (≥ 9) in the spatio vs temporal domain for traffic incident classification. We then explored three hypotheses — sentence indexing, date-to-text conversion, and German-to-English translation — and incorporated Retrieval Augmented Generation (RAG) to further examine the LLM hallucinations in both spatial and temporal domain. Our experiments reveal significant performance disparities in the spatio-temporal domain and demonstrate what types of hallucinations that RAG can mitigate and how it achieves this. We also provide open access to our H&PS traffic incident dataset, with the project demo and code available at Website https://sites.google.com/view/llmhallucination/home- Anthology ID:
- 2025.naacl-industry.4
- Volume:
- Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Weizhu Chen, Yi Yang, Mohammad Kachuee, Xue-Yong Fu
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 36–53
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.naacl-industry.4/
- DOI:
- Cite (ACL):
- Qiang Li, Mingkun Tan, Xun Zhao, Dan Zhang, Daoan Zhang, Shengzhao Lei, Anderson S. Chu, Lujun Li, and Porawit Kamnoedboon. 2025. How LLMs React to Industrial Spatio-Temporal Data? Assessing Hallucination with a Novel Traffic Incident Benchmark Dataset. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 36–53, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- How LLMs React to Industrial Spatio-Temporal Data? Assessing Hallucination with a Novel Traffic Incident Benchmark Dataset (Li et al., NAACL 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.naacl-industry.4.pdf