Weijing Huang

2025

pdf bib abs
LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models
Hieu Tran | Junda Wang | Yujan Ting | Hong Yu | Weijing Huang | Terrence Chen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Large language models (LLMs) often struggle with factual accuracy in knowledge-intensive domains like healthcare. We introduce LEAF (Learning and Evaluation Augmented by Fact-Checking), a framework for improving LLM factuality in medical question answering. LEAF comprises three components: (1) RAFE, a robust fact-checking system using open-source LLMs and domain-specific retrieval to evaluate response accuracy; (2) Fact-Check-then-RAG, which leverages fact-checking results to guide retrieval without parameter updates; and (3) Learning from Fact Check, enabling self-training through supervised fine-tuning or preference-based learning using fact-checking as pseudo-labels. Experimental results show that RAFE outperforms Factcheck-GPT in detecting inaccuracies, Fact-Check-then-RAG effectively corrects errors, and Learning from Fact Check improves performance without labeled data. In a real-world healthcare deployment with proprietary medical documents, LEAF achieved an 83% improvement in factuality scores, demonstrating practical applicability for adapting general-purpose LLMs to organization-specific knowledge. Our framework provides a scalable solution for industrial applications requiring high factual accuracy.

pdf bib abs
Training Medical QA Models Based on Mixed Rewards from Multiple-Choice and Open-Ended Questions
Yue Qiu | Yujan Ting | Pei Dong | Terrence Chen | Weijing Huang
Findings of the Association for Computational Linguistics: EMNLP 2025

Reinforcement learning (RL) for large language models (LLMs) typically requires clear reward signals, which are often unavailable for open-ended (OE) questions where answer evaluation is ambiguous without scalable expert labeling. We investigate whether LLMs benefit from training on mixed data with varying reward clarity. Our approach combines Multiple-choice questions (MCQs), which offer clear binary rewards, with OE questions, for which we use simpler, potentially noisy rewards such as Jaccard similarity or LLM-based evaluators. We hypothesize that MCQs can stabilize training when mixed with OE questions. Our experiments show this mixed-data approach consistently improves medical question-answering performance across model scales.

2018

pdf bib abs
PhraseCTM: Correlated Topic Modeling on Phrases within Markov Random Fields
Weijing Huang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Recent emerged phrase-level topic models are able to provide topics of phrases, which are easy to read for humans. But these models are lack of the ability to capture the correlation structure among the discovered numerous topics. We propose a novel topic model PhraseCTM and a two-stage method to find out the correlated topics at phrase level. In the first stage, we train PhraseCTM, which models the generation of words and phrases simultaneously by linking the phrases and component words within Markov Random Fields when they are semantically coherent. In the second stage, we generate the correlation of topics from PhraseCTM. We evaluate our method by a quantitative experiment and a human study, showing the correlated topic modeling on phrases is a good and practical way to interpret the underlying themes of a corpus.

Co-authors

Junda Wang 1

Hong Yu 1

Venues

Fix data

Weijing Huang

Fixing paper assignments

2025

2018

Co-authors

Venues