PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks
Yunuo Liu, Dawei Zhu, Zena Al-Khalili, Dai Cheng, Yanjun Chen, Dietrich Klakow, Wei Zhang, Xiaoyu Shen
Abstract
We present PricingLogic, the first benchmarkthat probes whether Large Language Mod-els (LLMs) can reliably automate tourism-booking prices when multiple, overlapping farerules apply. Travel agencies are eager to of-fload this error-prone task to AI systems; how-ever, deploying LLMs without verified reliabil-ity could result in significant financial lossesand erode customer trust. PricingLogic com-prises 300 natural-language questions based onbooking requests derived from 42 real-worldpricing policies, spanning two levels of diffi-culty: (i) basic customer-type pricing and (ii)bundled-tour calculations involving interactingdiscounts. Evaluations of a line of LLMs re-veal a steep performance drop on the harder tier,exposing systematic failures in rule interpreta-tion and arithmetic reasoning. These resultshighlight that, despite their general capabilities,today’s LLMs remain unreliable for revenue-critical applications without further safeguardsor domain adaptation. Our code and dataset areavaliable in https://github.com/EIT-NLP/PricingLogic.- Anthology ID:
- 2025.emnlp-main.393
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7736–7745
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.393/
- DOI:
- Cite (ACL):
- Yunuo Liu, Dawei Zhu, Zena Al-Khalili, Dai Cheng, Yanjun Chen, Dietrich Klakow, Wei Zhang, and Xiaoyu Shen. 2025. PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7736–7745, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks (Liu et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.393.pdf