Xiaobo Zhang


2025

pdf bib
RoBGuard: Enhancing LLMs to Assess Risk of Bias in Clinical Trial Documents
Changkai Ji | Bowen Zhao | Zhuoyao Wang | Yingwen Wang | Yuejie Zhang | Ying Cheng | Rui Feng | Xiaobo Zhang
Proceedings of the 31st International Conference on Computational Linguistics

Randomized Controlled Trials (RCTs) are rigorous clinical studies crucial for reliable decision-making, but their credibility can be compromised by bias. The Cochrane Risk of Bias tool (RoB 2) assesses this risk, yet manual assessments are time-consuming and labor-intensive. Previous approaches have employed Large Language Models (LLMs) to automate this process. However, they typically focus on manually crafted prompts and a restricted set of simple questions, limiting their accuracy and generalizability. Inspired by the human bias assessment process, we propose RoBGuard, a novel framework for enhancing LLMs to assess the risk of bias in RCTs. Specifically, RoBGuard integrates medical knowledge-enhanced question reformulation, multimodal document parsing, and multi-expert collaboration to ensure both completeness and accuracy. Additionally, to address the lack of suitable datasets, we introduce two new datasets: RoB-Item and RoB-Domain. Experimental results demonstrate RoBGuard’s effectiveness on the RoB-Item dataset, outperforming existing methods.

pdf bib
GLiM: Integrating Graph Transformer and LLM for Document-Level Biomedical Relation Extraction with Incomplete Labeling
Hao Fang | Yuejie Zhang | Rui Feng | Yingwen Wang | Qing Wang | Wen He | Xiaobo Zhang | Tao Zhang | Shang Gao
Findings of the Association for Computational Linguistics: ACL 2025

Document-level relation extraction (DocRE) identifies relations between entities across an entire document. However, as the number and complexity of entities and entity-pair relations grow, the problem space expands quadratically, causing incomplete annotations and frequent false negatives, especially in biomedical datasets due to high construction costs. This leads to low recall in real-world scenarios. To address this, we propose GLiM, a novel framework that reduces the problem space using a graph-enhanced Transformer-based model and leverages large language models (LLMs) for reasoning. GLiM employs a cascaded approach: first, a graph-enhanced Transformer processes entity-pair relations with finer granularity by dynamically adjusting the graph size based on the number of entities; then, LLM inference handles challenging cases. Experiments show that GLiM boosts average recall and F1 scores by +6.34 and +4.41, respectively, outperforming state-of-the-art models on biomedical benchmarks. These results demonstrate the effectiveness of combining graph-enhanced Transformers with LLM inference for biomedical DocRE. Code will be released at https://github.com/HaoFang10/GLiM.

2023

pdf bib
Large Language Models are Complex Table Parsers
Bowen Zhao | Changkai Ji | Yuejie Zhang | Wen He | Yingwen Wang | Qing Wang | Rui Feng | Xiaobo Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

With the Generative Pre-trained Transformer 3.5 (GPT-3.5) exhibiting remarkable reasoning and comprehension abilities in Natural Language Processing (NLP), most Question Answering (QA) research has primarily centered around general QA tasks based on GPT, neglecting the specific challenges posed by Complex Table QA. In this paper, we propose to incorporate GPT-3.5 to address such challenges, in which complex tables are reconstructed into tuples and specific prompt designs are employed for dialogues. Specifically, we encode each cell’s hierarchical structure, position information, and content as a tuple. By enhancing the prompt template with an explanatory description of the meaning of each tuple and the logical reasoning process of the task, we effectively improve the hierarchical structure awareness capability of GPT-3.5 to better parse the complex tables. Extensive experiments and results on Complex Table QA datasets, i.e., the open-domain dataset HiTAB and the aviation domain dataset AIT-QA show that our approach significantly outperforms previous work on both datasets, leading to state-of-the-art (SOTA) performance.