Yung-Hui Li

2026

Truth or Dare: Analyzing LLM Susceptibility to External Evidence of Varying Factuality
Han-Yu Su | Kuan-Yu Chu | Yung-Hui Li | Lun-Wei Ku
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)

Modern Large Language Models (LLMs) often rely on Retrieval-Augmented Generation (RAG) to access up-to-date information; however, retrieved corpora may contain misleading, outdated, or incorrect content, raising concerns about how such evidence affects model reliability. In this work, we investigate the susceptibility of LLMs to false external evidence. Existing studies have shown that poisoned external corpora can mislead LLM responses; yet, there is still a lack of studies on the effects of different evidence properties. To bridge this gap, we design comprehensive experiments along three dimensions: styles of evidence, quantity of evidence, and the semantic similarity between external messages and the model’s internal belief. We find that instructive-style evidence demonstrates the most severe performance degradation. On the other hand, we observe a steady decline in model response quality as the amount of false evidence accumulates. Finally, we show that LLMs are more susceptible to factually incorrect evidence when their semantic similarity is close to the model’s parametric knowledge.

pdf bib abs

Expert Calibration Lens for Pruning Mixture of Experts
Luis Frentzen Salim | Chia-Chun Wu | Tran Van Nhiem | Lun-Wei Ku | Yung-Hui Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Expert pruning is a practical deployment technique for Mixture-of-Experts (MoE) models. It reduces resource usage and mitigates expert redundancy, but its success depends strongly on the calibration set used for pruning. In domain-general settings, it is unclear which properties of the calibration data drive good pruning outcomes, and the effects of calibration perturbations are often unintuitive. We observe, for example, that calibration sets in different languages can lead to very similar pruning results despite appearing dissimilar on the surface.To address this, we propose Expert Calibration Lens, a lightweight analysis tool that compares expert activation patterns across datasets to predict the impact of calibration perturbations without repeatedly running expensive pruning procedures. We use activations that are quick to compute and evaluate the resulting analysis for downstream task performance.

2023

pdf bib abs

This work introduces a novel task, location-aware visual question generation (LocaVQG), which aims to generate engaging questions from data relevant to a particular geographical location. Specifically, we represent such location-aware information with surrounding images and a GPS coordinate. To tackle this task, we present a dataset generation pipeline that leverages GPT-4 to produce diverse and sophisticated questions. Then, we aim to learn a lightweight model that can address the LocaVQG task and fit on an edge device, such as a mobile phone. To this end, we propose a method which can reliably generate engaging questions from location-aware information. Our proposed method outperforms baselines regarding human evaluation (e.g., engagement, grounding, coherence) and automatic evaluation metrics (e.g., BERTScore, ROUGE-2). Moreover, we conduct extensive ablation studies to justify our proposed techniques for both generating the dataset and solving the task.

Co-authors

I-Bin Liao 1

Luis Frentzen Salim 1

Venues

Fix author