Min Wang
2026
NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks
Zhihao Luo | Wentao Yan | Jingyu Gong | Min Wang | Zhizhong Zhang | Xuhong Wang | Yuan Xie | Xin Tan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhihao Luo | Wentao Yan | Jingyu Gong | Min Wang | Zhizhong Zhang | Xuhong Wang | Yuan Xie | Xin Tan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent advances in Graphical User Interface (GUI) and embodied navigation have driven progress, yet these domains have largely evolved in isolation, with disparate datasets and training paradigms. In this paper, we observe that both tasks can be formulated as Markov Decision Processes (MDP), suggesting a foundational principle for their unification. Hence, we present NaviMaster, the first unified agent capable of unifying GUI navigation and embodied navigation within a single framework. Specifically, NaviMaster (i) proposes a visual-target trajectory collection pipeline that generates trajectories for both GUI and embodied tasks using a single formulation. (ii) employs a unified reinforcement learning framework on the mix data to improve generalization. (iii) designs a novel distance-aware reward to ensure efficient learning from the trajectories. Through extensive experiments on out-of-domain benchmarks, NaviMaster is shown to outperform state-of-the-art agents in GUI navigation, spatial affordance prediction, and embodied navigation. Ablation studies further demonstrate the efficacy of our unified training strategy, data mixing strategy, and reward design. Resources will be released to the community.
2025
LastingBench: Defend Benchmarks Against Knowledge Leakage
Yixiong Fang | Tianran Sun | Yuling Shi | Min Wang | Xiaodong Gu
Findings of the Association for Computational Linguistics: EMNLP 2025
Yixiong Fang | Tianran Sun | Yuling Shi | Min Wang | Xiaodong Gu
Findings of the Association for Computational Linguistics: EMNLP 2025
The increasing size and complexity of large language models (LLMs) raise concerns about their ability to “cheat” on standard Question Answering (QA) benchmarks by memorizing task-specific data. This undermines the validity of benchmark evaluations, as they no longer reflect genuine model capabilities but instead the effects of data leakage. While existing methods detect such leakage, they fail to address the long-term challenge of mitigating it. In this paper, we introduce LastingBench, a novel approach to reinforce and safeguard existing benchmarks against knowledge leakage. Our method involves identifying leakage points through perturbation-based detection, followed by counterfactual rewriting to disrupt memorization while preserving the benchmark’s original evaluative intent. We demonstrate that our approach significantly reduces memorization effects in long-context QA benchmarks, providing a more accurate assessment of model reasoning and generalization abilities. Our experiments show that LastingBench not only uncovers substantial leakage in benchmarks like HotpotQA but also yields a more reliable evaluation of state-of-the-art models, ensuring that benchmarks remain effective and resilient over time.
2021
Unimodal and Crossmodal Refinement Network for Multimodal Sequence Fusion
Xiaobao Guo | Adams Kong | Huan Zhou | Xianfeng Wang | Min Wang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Xiaobao Guo | Adams Kong | Huan Zhou | Xianfeng Wang | Min Wang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Effective unimodal representation and complementary crossmodal representation fusion are both important in multimodal representation learning. Prior works often modulate one modal feature to another straightforwardly and thus, underutilizing both unimodal and crossmodal representation refinements, which incurs a bottleneck of performance improvement. In this paper, Unimodal and Crossmodal Refinement Network (UCRN) is proposed to enhance both unimodal and crossmodal representations. Specifically, to improve unimodal representations, a unimodal refinement module is designed to refine modality-specific learning via iteratively updating the distribution with transformer-based attention layers. Self-quality improvement layers are followed to generate the desired weighted representations progressively. Subsequently, those unimodal representations are projected into a common latent space, regularized by a multimodal Jensen-Shannon divergence loss for better crossmodal refinement. Lastly, a crossmodal refinement module is employed to integrate all information. By hierarchical explorations on unimodal, bimodal, and trimodal interactions, UCRN is highly robust against missing modality and noisy data. Experimental results on MOSI and MOSEI datasets illustrated that the proposed UCRN outperforms recent state-of-the-art techniques and its robustness is highly preferred in real multimodal sequence fusion scenarios. Codes will be shared publicly.
2018
Yuan at SemEval-2018 Task 1: Tweets Emotion Intensity Prediction using Ensemble Recurrent Neural Network
Min Wang | Xiaobing Zhou
Proceedings of the 12th International Workshop on Semantic Evaluation
Min Wang | Xiaobing Zhou
Proceedings of the 12th International Workshop on Semantic Evaluation
We perform the LSTM and BiLSTM model for the emotion intensity prediction. We only join the third subtask in Task 1:Affect in Tweets. Our system rank 6th among all the teams.
2017
YNUDLG at IJCNLP-2017 Task 5: A CNN-LSTM Model with Attention for Multi-choice Question Answering in Examinations
Min Wang | Qingxun Liu | Peng Ding | Yongbin Li | Xiaobing Zhou
Proceedings of the IJCNLP 2017, Shared Tasks
Min Wang | Qingxun Liu | Peng Ding | Yongbin Li | Xiaobing Zhou
Proceedings of the IJCNLP 2017, Shared Tasks
In this paper, we perform convolutional neural networks (CNN) to learn the joint representations of question-answer pairs first, then use the joint representations as the inputs of the long short-term memory (LSTM) with attention to learn the answer sequence of a question for labeling the matching quality of each answer. We also incorporating external knowledge by training Word2Vec on Flashcards data, thus we get more compact embedding. Experimental results show that our method achieves better or comparable performance compared with the baseline system. The proposed approach achieves the accuracy of 0.39, 0.42 in English valid set, test set, respectively.