2025
pdf
bib
abs
RoBGuard: Enhancing LLMs to Assess Risk of Bias in Clinical Trial Documents
Changkai Ji
|
Bowen Zhao
|
Zhuoyao Wang
|
Yingwen Wang
|
Yuejie Zhang
|
Ying Cheng
|
Rui Feng
|
Xiaobo Zhang
Proceedings of the 31st International Conference on Computational Linguistics
Randomized Controlled Trials (RCTs) are rigorous clinical studies crucial for reliable decision-making, but their credibility can be compromised by bias. The Cochrane Risk of Bias tool (RoB 2) assesses this risk, yet manual assessments are time-consuming and labor-intensive. Previous approaches have employed Large Language Models (LLMs) to automate this process. However, they typically focus on manually crafted prompts and a restricted set of simple questions, limiting their accuracy and generalizability. Inspired by the human bias assessment process, we propose RoBGuard, a novel framework for enhancing LLMs to assess the risk of bias in RCTs. Specifically, RoBGuard integrates medical knowledge-enhanced question reformulation, multimodal document parsing, and multi-expert collaboration to ensure both completeness and accuracy. Additionally, to address the lack of suitable datasets, we introduce two new datasets: RoB-Item and RoB-Domain. Experimental results demonstrate RoBGuard’s effectiveness on the RoB-Item dataset, outperforming existing methods.
pdf
bib
abs
EmoCharacter: Evaluating the Emotional Fidelity of Role-Playing Agents in Dialogues
Qiming Feng
|
Qiujie Xie
|
Xiaolong Wang
|
Qingqiu Li
|
Yuejie Zhang
|
Rui Feng
|
Tao Zhang
|
Shang Gao
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Role-playing agents (RPAs) powered by large language models (LLMs) have been widely utilized in dialogue systems for their capability to deliver personalized interactions. Current evaluations of RPAs mainly focus on personality fidelity, tone imitation, and knowledge consistency, while overlooking emotional fidelity, a key factor that affects user experience. To this end, we propose a benchmark called EmoCharacter to assess emotional fidelity of RPAs in dialogues. EmoCharacter includes two benchmark datasets (single-turn and multi-turn dialogues), three evaluation settings, and six metrics to measure the emotional fidelity between RPAs and the characters they portray. Based on EmoCharacter, we conduct extensive evaluations on RPAs powered by seven widely used LLMs with representative role-playing methods. Our empirical findings reveal that: (1) Contrary to intuition, current role-playing methods often reduce the emotional fidelity of LLMs in dialogues; (2) Enhancing the general capabilities of LLMs does not necessarily improve the emotional fidelity of RPAs; (3) Fine-tuning or In-Context Learning based on real dialogue data can enhance emotional fidelity.
2023
pdf
bib
abs
Large Language Models are Complex Table Parsers
Bowen Zhao
|
Changkai Ji
|
Yuejie Zhang
|
Wen He
|
Yingwen Wang
|
Qing Wang
|
Rui Feng
|
Xiaobo Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
With the Generative Pre-trained Transformer 3.5 (GPT-3.5) exhibiting remarkable reasoning and comprehension abilities in Natural Language Processing (NLP), most Question Answering (QA) research has primarily centered around general QA tasks based on GPT, neglecting the specific challenges posed by Complex Table QA. In this paper, we propose to incorporate GPT-3.5 to address such challenges, in which complex tables are reconstructed into tuples and specific prompt designs are employed for dialogues. Specifically, we encode each cell’s hierarchical structure, position information, and content as a tuple. By enhancing the prompt template with an explanatory description of the meaning of each tuple and the logical reasoning process of the task, we effectively improve the hierarchical structure awareness capability of GPT-3.5 to better parse the complex tables. Extensive experiments and results on Complex Table QA datasets, i.e., the open-domain dataset HiTAB and the aviation domain dataset AIT-QA show that our approach significantly outperforms previous work on both datasets, leading to state-of-the-art (SOTA) performance.
pdf
bib
abs
Sparse Frame Grouping Network with Action Centered for Untrimmed Video Paragraph Captioning
Guorui Yu
|
Yimin Hu
|
Yuejie Zhang
|
Rui Feng
|
Tao Zhang
|
Shang Gao
Findings of the Association for Computational Linguistics: EMNLP 2023
Generating paragraph captions for untrimmed videos without event annotations is challenging, especially when aiming to enhance precision and minimize repetition at the same time. To address this challenge, we propose a module called Sparse Frame Grouping (SFG). It dynamically groups event information with the help of action information for the entire video and excludes redundant frames within pre-defined clips. To enhance the performance, an Intra Contrastive Learning technique is designed to align the SFG module with the core event content in the paragraph, and an Inter Contrastive Learning technique is employed to learn action-guided context with reduced static noise simultaneously. Extensive experiments are conducted on two benchmark datasets (ActivityNet Captions and YouCook2). Results demonstrate that SFG outperforms the state-of-the-art methods on all metrics.
2020
pdf
bib
abs
Improving Sentence Classification by Multilingual Data Augmentation and Consensus Learning
Yanfei Wang
|
Yangdong Chen
|
Yuejie Zhang
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Neural network based models have achieved impressive results on the sentence classification task. However, most of previous work focuses on designing more sophisticated network or effective learning paradigms on monolingual data, which often suffers from insufficient discriminative knowledge for classification. In this paper, we investigate to improve sentence classification by multilingual data augmentation and consensus learning. Comparing to previous methods, our model can make use of multilingual data generated by machine translation and mine their language-share and language-specific knowledge for better representation and classification. We evaluate our model using English (i.e., source language) and Chinese (i.e., target language) data on several sentence classification tasks. Very positive classification performance can be achieved by our proposed model.
2010
pdf
bib
Fusion of Multiple Features and Ranking SVM for Web-based English-Chinese OOV Term Translation
Yuejie Zhang
|
Yang Wang
|
Lei Cen
|
Yanxia Su
|
Cheng Jin
|
Xiangyang Xue
|
Jianping Fan
Coling 2010: Posters
2009
pdf
bib
English-Chinese Bi-Directional OOV Translation based on Web Mining and Supervised Learning
Yuejie Zhang
|
Yang Wang
|
Xiangyang Xue
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
2008
pdf
bib
CRF-based Hybrid Model for Word Segmentation, NER and even POS Tagging
Zhiting Xu
|
Xian Qian
|
Yuejie Zhang
|
Yaqian Zhou
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing