Yunuo Liu
2025
PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks
Yunuo Liu
|
Dawei Zhu
|
Zena Al-Khalili
|
Dai Cheng
|
Yanjun Chen
|
Dietrich Klakow
|
Wei Zhang
|
Xiaoyu Shen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
We present PricingLogic, the first benchmarkthat probes whether Large Language Mod-els (LLMs) can reliably automate tourism-booking prices when multiple, overlapping farerules apply. Travel agencies are eager to of-fload this error-prone task to AI systems; how-ever, deploying LLMs without verified reliabil-ity could result in significant financial lossesand erode customer trust. PricingLogic com-prises 300 natural-language questions based onbooking requests derived from 42 real-worldpricing policies, spanning two levels of diffi-culty: (i) basic customer-type pricing and (ii)bundled-tour calculations involving interactingdiscounts. Evaluations of a line of LLMs re-veal a steep performance drop on the harder tier,exposing systematic failures in rule interpreta-tion and arithmetic reasoning. These resultshighlight that, despite their general capabilities,today’s LLMs remain unreliable for revenue-critical applications without further safeguardsor domain adaptation. Our code and dataset areavaliable in https://github.com/EIT-NLP/PricingLogic.
Search
Fix author
Co-authors
- Zena Al-Khalili 1
- Yanjun Chen 1
- Dai Cheng 1
- Dietrich Klakow 1
- Xiaoyu Shen 1
- show all...