Jiashuo Wang
Also published as: Jessie Wang
2026
Foresight Optimization for Strategic Reasoning in Large Language Models
Jessie Wang | Jiawen Duan | Jian Wang | Kaitao Song | Chunpu Xu | Johnny K. W. Ho | YU Fenggang | Johan F. Hoorn | Wenjie Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jessie Wang | Jiawen Duan | Jian Wang | Kaitao Song | Chunpu Xu | Johnny K. W. Ho | YU Fenggang | Johan F. Hoorn | Wenjie Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reasoning capabilities in large language models (LLMs) have generally advanced significantly. However, it is still challenging for existing reasoning-based LLMs to perform effective decision-making abilities in multi-agent environments, due to the absence of explicit foresight modeling. To this end, strategic reasoning, the most fundamental capability to anticipate the counterpart’s behaviors and foresee its possible future actions, has been introduced to alleviate the above issues. Strategic reasoning is fundamental to effective decision-making in multi-agent environments, yet existing reasoning enhancement methods for LLMs do not explicitly capture its foresight nature. In this work, we introduce **Fo**resight **P**olicy **O**ptimization (**FoPO**) to enhance strategic reasoning in LLMs, which integrates opponent modeling principles into policy optimization, thereby enabling explicit consideration of both self-interest and counterpart influence. Specifically, we construct two curated datasets, namely ***Cooperative RSA*** and ***Competitive Taboo***, equipped with well-designed rules and moderate difficulty to facilitate a systematic investigation of FoPO in a self-play framework. Our experiments demonstrate that FoPO significantly enhances strategic reasoning across LLMs of varying sizes and origins. Moreover, models trained with FoPO exhibit strong generalization to out-of-domain strategic scenarios, substantially outperforming standard LLM reasoning optimization baselines.
2025
Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States
Yang Xiao | Jiashuo Wang | Qiancheng Xu | Changhe Song | Chunpu Xu | Yi Cheng | Wenjie Li | Pengfei Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yang Xiao | Jiashuo Wang | Qiancheng Xu | Changhe Song | Chunpu Xu | Yi Cheng | Wenjie Li | Pengfei Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As Large Language Models (LLMs) increasingly participate in human-AI interactions, evaluating their Theory of Mind (ToM) capabilities - particularly their ability to track dynamic mental states - becomes crucial. While existing benchmarks assess basic ToM abilities, they predominantly focus on static snapshots of mental states, overlooking the temporal evolution that characterizes real-world social interactions. We present **DynToM**, a novel benchmark specifically designed to evaluate LLMs’ ability to understand and track the temporal progression of mental states across interconnected scenarios. Through a systematic four-step framework, we generate 1,100 social contexts encompassing 5,500 scenarios and 78,100 questions, each validated for realism and quality. Our comprehensive evaluation of ten state-of-the-art LLMs reveals that their average performance underperforms humans by 44.7%, with performance degrading significantly when tracking and reasoning about the shift of mental states. This performance gap highlights fundamental limitations in current LLMs’ ability to model the dynamic nature of human mental states.
MIO: A Foundation Model on Multimodal Tokens
Zekun Moore Wang | King Zhu | Chunpu Xu | Wangchunshu Zhou | Jiaheng Liu | Yibo Zhang | Jessie Wang | Ning Shi | Siyu Li | Yizhi Li | Haoran Que | Zhaoxiang Zhang | Yuanxing Zhang | Ge Zhang | Ke Xu | Jie Fu | Wenhao Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Zekun Moore Wang | King Zhu | Chunpu Xu | Wangchunshu Zhou | Jiaheng Liu | Yibo Zhang | Jessie Wang | Ning Shi | Siyu Li | Yizhi Li | Haoran Que | Zhaoxiang Zhang | Yuanxing Zhang | Ge Zhang | Ke Xu | Jie Fu | Wenhao Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
In this paper, we introduce MIO, a novel foundation model built on multimodal tokens, capable of understanding and generating speech, text, images, and videos in an end-to-end, autoregressive manner. While the emergence of large language models (LLMs) and multimodal large language models (MM-LLMs) propels advancements in artificial general intelligence through their versatile capabilities, they still lack true any-to-any understanding and generation. Recently, the release of GPT-4o has showcased the remarkable potential of any-to-any LLMs for complex real-world tasks, enabling omnidirectional input and output across images, speech, and text. However, it is closed-source and does not support the generation of multimodal interleaved sequences. To address this gap, we present MIO, which is trained on a mixture of discrete tokens across four modalities using causal multimodal modeling. MIO undergoes a four-stage training process: (1) alignment pre-training, (2) interleaved pre-training, (3) speech-enhanced pre-training, and (4) comprehensive supervised fine-tuning on diverse textual, visual, and speech tasks. Our experimental results indicate that MIO exhibits competitive, and in some cases superior, performance compared to previous dual-modal baselines, any-to-any model baselines, and even modality-specific baselines. Moreover, MIO demonstrates advanced capabilities inherent to its any-to-any feature, such as interleaved video-text generation, chain-of-visual-thought reasoning, visual guideline generation, instructional image editing, etc.
2024
Muffin: Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback
Jiashuo Wang | Chunpu Xu | Chak Tou Leong | Wenjie Li | Jing Li
Findings of the Association for Computational Linguistics: ACL 2024
Jiashuo Wang | Chunpu Xu | Chak Tou Leong | Wenjie Li | Jing Li
Findings of the Association for Computational Linguistics: ACL 2024
Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue
Jian Wang | Chak Tou Leong | Jiashuo Wang | Dongding Lin | Wenjie Li | Xiaoyong Wei
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jian Wang | Chak Tou Leong | Jiashuo Wang | Dongding Lin | Wenjie Li | Xiaoyong Wei
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tuning language models for dialogue generation has been a prevalent paradigm for building capable dialogue agents. Yet, traditional tuning narrowly views dialogue generation as resembling other language generation tasks, ignoring the role disparities between two speakers and the multi-round interactive process that dialogues ought to be. Such a manner often leads to unsatisfactory chat consistency for the built agent. In this work, we emphasize the interactive, communicative nature of dialogue and argue that it is more feasible to model the speaker roles of agent and user separately, enabling the agent to adhere to its role consistently. With this in mind, we propose an efficient Multi-round Interactive Dialogue Tuning (Midi-Tuning) framework. It models the agent and user individually with two adapters built upon large language models. The adapters make use of respective utterances round by round in alternating order and they are tuned via a round-level memory caching mechanism. Extensive experiments demonstrate that, our framework performs superior to traditional fine-tuning and harbors the tremendous potential for improving dialogue consistency.
2023
Self-Detoxifying Language Models via Toxification Reversal
Chak Tou Leong | Yi Cheng | Jiashuo Wang | Jian Wang | Wenjie Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Chak Tou Leong | Yi Cheng | Jiashuo Wang | Jian Wang | Wenjie Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Language model detoxification aims to minimize the risk of generating offensive or harmful content in pretrained language models (PLMs) for safer deployment. Existing methods can be roughly categorized as finetuning-based and decoding-based. However, the former is often resource-intensive, while the latter relies on additional components and potentially compromises the generation fluency. In this paper, we propose a more lightweight approach that enables the PLM itself to achieve “self-detoxification”. Our method is built upon the observation that prepending a negative steering prompt can effectively induce PLMs to generate toxic content. At the same time, we are inspired by the recent research in the interpretability field, which formulates the evolving contextualized representations within the PLM as an information stream facilitated by the attention layers. Drawing on this idea, we devise a method to identify the toxification direction from the normal generation process to the one prompted with the negative prefix, and then steer the generation to the reversed direction by manipulating the information movement within the attention layers. Experimental results show that our approach, without any fine-tuning or extra components, can achieve comparable performance with state-of-the-art methods.
2022
CARE: Causality Reasoning for Empathetic Responses by Conditional Graph Generation
Jiashuo Wang | Yi Cheng | Wenjie Li
Findings of the Association for Computational Linguistics: EMNLP 2022
Jiashuo Wang | Yi Cheng | Wenjie Li
Findings of the Association for Computational Linguistics: EMNLP 2022
Recent approaches to empathetic response generation incorporate emotion causalities to enhance comprehension of both the user’s feelings and experiences. However, these approaches suffer from two critical issues. First, they only consider causalities between the user’s emotion and the user’s experiences, and ignore those between the user’s experiences. Second, they neglect interdependence among causalities and reason them independently. To solve the above problems, we expect to reason all plausible causalities interdependently and simultaneously, given the user’s emotion, dialogue history, and future dialogue content. Then, we infuse these causalities into response generation for empathetic responses. Specifically, we design a new model, i.e., the Conditional Variational Graph Auto-Encoder (CVGAE), for the causality reasoning, and adopt a multi-source attention mechanism in the decoder for the causality infusion. We name the whole framework as CARE, abbreviated for CAusality Reasoning for Empathetic conversation. Experimental results indicate that our method achieves state-of-the-art performance.
Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning
Yi Cheng | Wenge Liu | Wenjie Li | Jiashuo Wang | Ruihui Zhao | Bang Liu | Xiaodan Liang | Yefeng Zheng
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Yi Cheng | Wenge Liu | Wenjie Li | Jiashuo Wang | Ruihui Zhao | Bang Liu | Xiaodan Liang | Yefeng Zheng
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Providing Emotional Support (ES) to soothe people in emotional distress is an essential capability in social interactions. Most existing researches on building ES conversation systems only considered single-turn interactions with users, which was over-simplified. In comparison, multi-turn ES conversation systems can provide ES more effectively, but face several new technical challenges, including: (1) how to adopt appropriate support strategies to achieve the long-term dialogue goal of comforting the user’s emotion; (2) how to dynamically model the user’s state. In this paper, we propose a novel system MultiESC to address these issues. For strategy planning, drawing inspiration from the A* search algorithm, we propose lookahead heuristics to estimate the future user feedback after using particular strategies, which helps to select strategies that can lead to the best long-term effects. For user state modeling, MultiESC focuses on capturing users’ subtle emotional expressions and understanding their emotion causes. Extensive experiments show that MultiESC significantly outperforms competitive baselines in both dialogue generation and strategy planning.
Search
Fix author
Co-authors
- Wenjie Li 7
- Yi Cheng 4
- Chunpu Xu 4
- Chak Tou Leong 3
- Jian Wang 3
- Jiawen Duan 1
- YU Fenggang 1
- Jie Fu 1
- Johnny K. W. Ho 1
- Johan F. Hoorn 1
- Wenhao Huang 1
- Jing Li 1
- Siyu Li 1
- Yizhi Li 1
- Xiaodan Liang 1
- Dongding Lin 1
- Pengfei Liu 1
- Wenge Liu 1
- Bang Liu 1
- Jiaheng Liu 1
- Haoran Que 1
- Ning Shi 1
- Kaitao Song 1
- Changhe Song 1
- Zekun Moore Wang 1
- Xiaoyong Wei 1
- Yang Xiao 1
- Qiancheng Xu 1
- Ke Xu 1
- Yibo Zhang 1
- Zhaoxiang Zhang 1
- Yuanxing Zhang 1
- Ge Zhang 1
- Ruihui Zhao 1
- Yefeng Zheng 1
- Wangchunshu Zhou 1
- King Zhu 1