Yun-Da Tsai
2026
Beyond Facts- Benchmarking Distributional Reading Comprehension in Large Language Models
Pei-Fu Guo | Ya An Tsai | Chun-Chia Hsu | Kai-Xin Chen | Yun-Da Tsai | Kai-Wei Chang | Nanyun Peng | Mi-Yen Yeh | Shou-De Lin
Findings of the Association for Computational Linguistics: ACL 2026
Pei-Fu Guo | Ya An Tsai | Chun-Chia Hsu | Kai-Xin Chen | Yun-Da Tsai | Kai-Wei Chang | Nanyun Peng | Mi-Yen Yeh | Shou-De Lin
Findings of the Association for Computational Linguistics: ACL 2026
While most reading comprehension benchmarks for LLMs focus on factual information that can be answered by localizing specific textual evidence, many real-world tasks require understanding distributional information, such as population-level trends and preferences expressed across collections of text. We introduce Text2DistBench, a reading comprehension benchmark for evaluating LLMs’ ability to infer distributional knowledge from natural language. Built from real-world YouTube comments about movie and music entities, the benchmark provides models with entity metadata and associated comments, and requires them to answer distributional questions, such as estimating the proportions of positive and negative comments, or identifying the most and second most frequent topics discussed among viewers. To support reliable and long-term evaluation, the construction pipeline of Text2DistBench is fully automated and continuously updated to incorporate newly emerging entities over time. Experiments across multiple LLMs show that while models substantially outperform random baselines, performance varies widely across different distribution types and characteristics. These findings highlight both the capabilities and limitations of current LLMs in distributional reading comprehension and demonstrate the value of Text2DistBench as a practical and scalable testbed for future research.
LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs
Pei-Fu Guo | Yun-Da Tsai | Chun-Chia Hsu | Kai-Xin Chen | Ya An Tsai | Kai-Wei Chang | Nanyun Peng | Mi-Yen Yeh | Shou-De Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Pei-Fu Guo | Yun-Da Tsai | Chun-Chia Hsu | Kai-Xin Chen | Ya An Tsai | Kai-Wei Chang | Nanyun Peng | Mi-Yen Yeh | Shou-De Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Evaluating cross-lingual knowledge transfer in large language models (LLMs) is challenging, as correct answers in a target language may arise either from genuine transfer or from prior exposure during pre-training. We present LiveCLKTBench, an automated generation pipeline specifically designed to isolate and measure cross-lingual knowledge transfer. Our pipeline identifies self-contained, time-sensitive knowledge entities from real-world domains, filters them based on temporal occurrence, and verifies them against the model’s knowledge. The documents of these valid entities are then used to generate factual questions, which are translated into multiple languages to evaluate transferability across linguistic boundaries. Using LiveCLKTBench, we evaluate several LLMs across five languages and observe that cross-lingual transfer is strongly influenced by linguistic distance and often asymmetric across language directions. While larger models improve transfer, the gains diminish with scale and vary across domains. These findings provide new insights into multilingual transfer and demonstrate the value of LiveCLKTBench as a reliable benchmark for future research.
2025
Benchmarking Uncertainty Metrics for LLM Target-Aware Search
Pei-Fu Guo | Yun-Da Tsai | Shou-De Lin
Findings of the Association for Computational Linguistics: EMNLP 2025
Pei-Fu Guo | Yun-Da Tsai | Shou-De Lin
Findings of the Association for Computational Linguistics: EMNLP 2025
LLM search methods, such as Chain of Thought (CoT) and Tree of Thought (ToT), enhance LLM reasoning by exploring multiple reasoning paths. When combined with search algorithms like MCTS and Bandit methods, their effectiveness relies heavily on uncertainty estimation to prioritize paths that align with specific search objectives. However, it remains unclear whether existing LLM uncertainty metrics adequately capture the diverse types of uncertainty required to guide different search objectives. In this work, we introduce a framework for uncertainty benchmarking, identifying four distinct uncertainty types: Answer, Correctness, Aleatoric, and Epistemic Uncertainty. Each type serves different optimization goals in search. Our experiments demonstrate that current metrics often align with only a subset of these uncertainty types, limiting their effectiveness for objective-aligned search in some cases. These findings highlight the need for additional target-aware uncertainty estimators that can adapt to various optimization goals in LLM search.
Text-centric Alignment for Bridging Test-time Unseen Modality
Yun-Da Tsai | Ting-Yu Yen | Pei-Fu Guo | Zhe-Yan Li | Shou-De Lin
Findings of the Association for Computational Linguistics: EMNLP 2025
Yun-Da Tsai | Ting-Yu Yen | Pei-Fu Guo | Zhe-Yan Li | Shou-De Lin
Findings of the Association for Computational Linguistics: EMNLP 2025
This paper addresses the challenge of handling unseen modalities and dynamic modality combinations at test time with our proposed text-centric alignment method. This training-free alignment approach unifies different input modalities into a single semantic text representation by leveraging in-context learning with Large Language Models and uni-modal foundation models. Our method significantly enhances the ability to manage unseen, diverse, and unpredictable modality combinations, making it suitable for both generative and discriminative models to adopt on top. Our extensive experiments primarily evaluate on discriminative tasks, demonstrating that our approach is essential for LLMs to achieve strong modality alignment performance. It also surpasses the limitations of traditional fixed-modality frameworks in embedding representations. This study contributes to the field by offering a flexible and effective solution for real-world applications where modality availability is dynamic and uncertain.