Jingyuan Li
2026
FactVerse: A Benchmark for Factual Consistency in Interleaved Image–Text Generation
Yubo Shan | Kun Zhang | Qiming Xu | Liping Cao | Yingying Cao | Jian Zhang | Yu Wang | Jingyuan Li | Yuanzhuo Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yubo Shan | Kun Zhang | Qiming Xu | Liping Cao | Yingying Cao | Jian Zhang | Yu Wang | Jingyuan Li | Yuanzhuo Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Interleaved multimodal understanding and generation—where models can interactively comprehend and produce images and text in arbitrary orders—has emerged as a key research direction in generative Multimodal Large Language Models(MLLMs). Such interleaved image–text content plays an increasingly important role in information dissemination. However, the compounded persuasive power of multimodal narratives also raises the risk of factual misinformation. Despite this, existing benchmarks lack effective mechanisms to evaluate factual consistency in interleaved image–text content. To bridge this gap, we introduce FactVerse, a benchmark dedicated to evaluating factual consistency in interleaved image-text generation. FactVerse comprises 3,000 human-verified instances across four categories and 50 domains, supporting both English and Chinese. We also establish a multi-dimensional evaluation framework designed to rigorously assess factual consistency. Experiments demonstrate that our framework achieves high alignment with human judgments, significantly outperforming existing evaluation methods. Furthermore, our analysis reveals systematic deficiencies in current models, offering critical insights for future design.
2018
A Syntactically Constrained Bidirectional-Asynchronous Approach for Emotional Conversation Generation
Jingyuan Li | Xiao Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Jingyuan Li | Xiao Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Traditional neural language models tend to generate generic replies with poor logic and no emotion. In this paper, a syntactically constrained bidirectional-asynchronous approach for emotional conversation generation (E-SCBA) is proposed to address this issue. In our model, pre-generated emotion keywords and topic keywords are asynchronously introduced into the process of decoding. It is much different from most existing methods which generate replies from the first word to the last. Through experiments, the results indicate that our approach not only improves the diversity of replies, but gains a boost on both logic and emotion compared with baselines.