Hanlin Xue
2026
Verifiable LLM-Generated Text Detection via Projected Semantic-Structural Distributions
Ruochong Xiong | Qien Li | Wangwang Lian | Yulong Wan | Hanlin Xue | Zhouxing Tan | Han Yang | Fengyu Lu | Junfei Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ruochong Xiong | Qien Li | Wangwang Lian | Yulong Wan | Hanlin Xue | Zhouxing Tan | Han Yang | Fengyu Lu | Junfei Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The widespread deployment of large language models (LLMs) makes detecting LLM-Generated text a critical security task. Existing methods, primarily relying on output probabilities from proxy models or single semantic features, suffer from distribution misalignment and limited interpretability. We observe that machine-generated text exhibits a directionally consistent systematic translation relative to human-written text within the joint semantic-structural space. Accordingly, we propose ProSSD, a statistical framework utilizing supervised subspace learning to extract compact features and construct conditional semantic distributions based on syntactic structures. By employing a likelihood ratio test, we derive a modified Mahalanobis distance, weighted by the Wasserstein distance, as the discriminative metric. Experiments demonstrate ProSSD’s superior robustness and computational efficiency across cross-domain, cross-model, and adversarial scenarios. Furthermore, we reveal the phenomena of systematic semantic translation and semantic collapse in machine-generated text, offering interpretable statistical insights into LLM generation behaviors.
2025
Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains
Xu Chu | Zhijie Tan | Hanlin Xue | Guanyu Wang | Tong Mo | Weiping Li
Findings of the Association for Computational Linguistics: ACL 2025
Xu Chu | Zhijie Tan | Hanlin Xue | Guanyu Wang | Tong Mo | Weiping Li
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models (LLMs) are widely applied to downstream domains. However, current LLMs for high-stakes domain tasks, such as financial investment and legal QA, typically generate brief answers without reasoning processes and explanations. This limits users’ confidence in making decisions based on their responses. While original CoT shows promise, it lacks self-correction mechanisms during reasoning. This work introduces Domaino1s, which enhances LLMs’ reasoning capabilities on domain tasks through supervised fine-tuning and tree search. We construct CoT-stock-2k and CoT-legal-2k datasets for fine-tuning models that activate domain-specific reasoning steps based on their judgment. Additionally, we propose Selective Tree Exploration to spontaneously explore solution spaces and sample optimal reasoning paths to improve performance. We also introduce PROOF-Score, a new metric for evaluating domain models’ explainability, complementing traditional accuracy metrics with richer assessment dimensions. Extensive experiments on stock investment recommendation and legal reasoning QA tasks demonstrate Domaino1s’s leading performance and explainability. Our code is available at https://anonymous.4open.science/r/Domaino1s-006F/.