Mengtian Guo
2026
Why is "Chicago" Predictive of Deceptive Reviews? Using LLMs to Discover Language Phenomena from Lexical Cues
Jiaming Qu | Mengtian Guo | Yue Wang
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
Jiaming Qu | Mengtian Guo | Yue Wang
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
Deceptive reviews mislead consumers, harm businesses, and undermine trust in online marketplaces. Machine learning classifiers can learn from large amounts of data to distinguish deceptive reviews from genuine ones. However, the distinguishing features learned by these classifiers are often subtle, fragmented, and difficult for humans to interpret, which can hinder user understanding and trust. In this work, we study whether large language models (LLMs) can translate such unintuitive lexical cues into human-understandable language phenomena. We propose a conjecture-then-validate framework, and show that language phenomena obtained in this manner are empirically grounded in data, generalizable across similar domains, and more predictive than phenomena derived from LLMs’ prior knowledge or in-context learning. Such phenomena can aid people in critically assessing the credibility of online reviews in environments where deception detection classifiers are unavailable.
2025
Learning to Rewrite Negation Queries in Product Search
Mengtian Guo | Mutasem Al-Darabsah | Choon Hui Teo | Jonathan May | Tarun Agarwal | Rahul Bhagat
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Mengtian Guo | Mutasem Al-Darabsah | Choon Hui Teo | Jonathan May | Tarun Agarwal | Rahul Bhagat
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
In product search, negation is frequently used to articulate unwanted product features or components. Modern search engines often struggle to comprehend negations, resulting in suboptimal user experiences. While various methods have been proposed to tackle negations in search, none of them took the vocabulary gap between query keywords and product text into consideration. In this work, we introduced a query rewriting approach to enhance the performance of product search engines when dealing with queries with negations. First, we introduced a data generation workflow that leverages large language models (LLMs) to extract query rewrites from product text. Subsequently, we trained a Seq2Seq model to generate query rewrite for unseen queries. Our experiments demonstrated that query rewriting yields a 3.17% precision@30 improvement for queries with negations. The promising results pave the way for further research on enhancing the search performance of queries with negations.
2021
Explainable Prediction of Text Complexity: The Missing Preliminaries for Text Simplification
Cristina Garbacea | Mengtian Guo | Samuel Carton | Qiaozhu Mei
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Cristina Garbacea | Mengtian Guo | Samuel Carton | Qiaozhu Mei
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Text simplification reduces the language complexity of professional content for accessibility purposes. End-to-end neural network models have been widely adopted to directly generate the simplified version of input text, usually functioning as a blackbox. We show that text simplification can be decomposed into a compact pipeline of tasks to ensure the transparency and explainability of the process. The first two steps in this pipeline are often neglected: 1) to predict whether a given piece of text needs to be simplified, and 2) if yes, to identify complex parts of the text. The two tasks can be solved separately using either lexical or deep learning methods, or solved jointly. Simply applying explainable complexity prediction as a preliminary step, the out-of-sample text simplification performance of the state-of-the-art, black-box simplification models can be improved by a large margin.