Haixu Tang
2026
LLMs as Lab Engineers: A Benchmark for Analytical Method Lifecycle Management
Xiaoyi Chen | Mahsa Monshizadeh | Chaoqi Zhang | Jianjun Lang | Yang Wu | Genevieve Mortensen | Xiaozhong Liu | Haixu Tang
Findings of the Association for Computational Linguistics: ACL 2026
Xiaoyi Chen | Mahsa Monshizadeh | Chaoqi Zhang | Jianjun Lang | Yang Wu | Genevieve Mortensen | Xiaozhong Liu | Haixu Tang
Findings of the Association for Computational Linguistics: ACL 2026
We introduce ChemBench, a comprehensive benchmark for evaluating LLMs’ capabilities in analytical chemistry scenarios. Unlike existing benchmarks focused on factual knowledge, ChemBench assesses model abilities to provide contextualized, practical guidance for complex analytical chemistry challenges, including instrument readiness checks, system suitability testing, method development, and troubleshooting for both liquid chromatography coupled mass spectrometry (LC-MS) and Gas Chromatography-Mass Spectrometry (GC-MS) platforms. We evaluate three enhancement approaches: chemistry-specialized models, human-guided Chain-of-Thought reasoning, and Retrieval-Augmented Generation (RAG). Our findings reveal that general-purpose commercial models often outperform domain-specialized ones, while RAG and reasoning significantly improve performance. The six-dimension evaluation framework (specificity, correctness, usefulness, feasibility, misinformation risk, and error handling) provides valuable insights into LLMs’ real-world utility for chemistry researchers, establishing a foundation for developing more effective AI assistants for scientific research.
Beyond Local vs. External: A Game-Theoretic Framework for Trustworthy Knowledge Acquisition
Rujing Yao | Yufei Shi | Yang Wu | Ang Li | Zhuoren Jiang | XiaoFeng Wang | Haixu Tang | Xiaozhong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Rujing Yao | Yufei Shi | Yang Wu | Ang Li | Zhuoren Jiang | XiaoFeng Wang | Haixu Tang | Xiaozhong Liu
Findings of the Association for Computational Linguistics: ACL 2026
Cloud-hosted Large Language Models (LLMs) offer unmatched reasoning capabilities and dynamic knowledge, yet submitting raw queries to these external services risks exposing sensitive user intent. Conversely, relying exclusively on trusted local models preserves privacy but often compromises answer quality due to limited parameter scale and knowledge. To resolve this dilemma, we propose Game-theoretic Trustworthy Knowledge Acquisition (GTKA), a framework that formulates the trade-off between knowledge utility and privacy as a strategic game. GTKA consists of three components: (i) a privacy-aware sub-query generator that decomposes sensitive intent into generalized, low-risk fragments; (ii) an adversarial reconstruction attacker that attempts to infer the original query from these fragments, providing adaptive leakage signals; and (iii) a trusted local integrator that synthesizes external responses within a secure boundary. By training the generator and attacker in an alternating adversarial manner, GTKA optimizes the sub-query generation policy to maximize knowledge acquisition accuracy while minimizing the reconstructability of the original sensitive intent. To validate our approach, we construct two sensitive-domain benchmarks in the biomedical and legal fields. Extensive experiments demonstrate that GTKA significantly reduces intent leakage compared to state-of-the-art baselines while maintaining high-fidelity answer quality.
Hey, That’s My Data! Token-Only Dataset Inference in Large Language Models
Chen Xiong | Zihao Wang | Rui Zhu | Tsung-Yi Ho | Pin-Yu Chen | Jingwei Xiong | Haixu Tang
Findings of the Association for Computational Linguistics: ACL 2026
Chen Xiong | Zihao Wang | Rui Zhu | Tsung-Yi Ho | Pin-Yu Chen | Jingwei Xiong | Haixu Tang
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) rely on massive training datasets, often including proprietary data, which raises concerns about unauthorized usage and copyright infringement. Existing dataset inference methods typically require access to log probabilities or other internal signals, but many modern LLMs restrict such access, motivating token-only inference approaches. We propose CatShift, a token-only dataset inference framework based on catastrophic forgetting, where models overwrite prior knowledge when trained on new data. Fine-tuning an LLM on a subset of its training data induces larger output shifts than fine-tuning on unseen data. CatShift compares these shifts against those from a known non-member validation set to infer whether a dataset was included in training. Experiments on both open-source and API-based LLMs show that CatShift remains effective without logit access, enabling practical protection of proprietary datasets.
2025
Knowledge-Aware Co-Reasoning for Multidisciplinary Collaboration
Xurui Li | Wanghaijiao | Kaisong Song | Rui Zhu | Haixu Tang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xurui Li | Wanghaijiao | Kaisong Song | Rui Zhu | Haixu Tang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) have shown significant potential to improve diagnostic performance for clinical professionals. Existing multi-agent paradigms rely mainly on prompt engineering, suffering from improper agent selection and insufficient knowledge integration. In this work, we propose a novel framework KACR (Knowledge-Aware Co-Reasoning) that integrates structured knowledge reasoning into multidisciplinary collaboration from two aspects: (1) a reinforcement learning-optimized agent that uses clinical knowledge graphs to guide dynamic discipline determination; (2) a multidisciplinary collaboration strategy that enables robust consensus through integration of domain-specific expertise and interdisciplinary persuasion mechanism. Extensive experiments conducted on both academic and real-world datasets demonstrate the effectiveness of our method.
2023
STINMatch: Semi-Supervised Semantic-Topological Iteration Network for Financial Risk Detection via News Label Diffusion
Xurui Li | Yue Qin | Rui Zhu | Tianqianjin Lin | Yongming Fan | Yangyang Kang | Kaisong Song | Fubang Zhao | Changlong Sun | Haixu Tang | Xiaozhong Liu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Xurui Li | Yue Qin | Rui Zhu | Tianqianjin Lin | Yongming Fan | Yangyang Kang | Kaisong Song | Fubang Zhao | Changlong Sun | Haixu Tang | Xiaozhong Liu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Commercial news provide rich semantics and timely information for automated financial risk detection. However, unaffordable large-scale annotation as well as training data sparseness barrier the full exploitation of commercial news in risk detection. To address this problem, we propose a semi-supervised Semantic-Topological Iteration Network, STINMatch, along with a news-enterprise knowledge graph (NEKG) to endorse the risk detection enhancement. The proposed model incorporates a label correlation matrix and interactive consistency regularization techniques into the iterative joint learning framework of text and graph modules. The carefully designed framework takes full advantage of the labeled and unlabeled data as well as their interrelations, enabling deep label diffusion coordination between article-level semantics and label correlations following the topological structure. Extensive experiments demonstrate the superior effectiveness and generalization ability of STINMatch.
Search
Fix author
Co-authors
- Xiaozhong Liu 3
- Rui Zhu 3
- Xurui Li 2
- Kaisong Song 2
- Yang Wu 2
- Xiaoyi Chen 1
- Pin-Yu Chen 1
- Yongming Fan 1
- Tsung-Yi Ho 1
- Zhuoren Jiang 1
- Yangyang Kang 1
- Jianjun Lang 1
- Ang Li 1
- Tianqianjin Lin 1
- Mahsa Monshizadeh 1
- Genevieve Mortensen 1
- Yue Qin 1
- Yufei Shi 1
- Changlong Sun 1
- XiaoFeng Wang 1
- Zihao Wang 1
- Wanghaijiao 1
- Chen Xiong 1
- Jingwei Xiong 1
- Rujing Yao 1
- Chaoqi Zhang 1
- Fubang Zhao 1