Yong Wang
Papers on this page may belong to the following people: Yong Wang, Yong Wang
2026
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
Yuxiang Ji | Yong Wang | Ziyu Ma | Yiming Hu | Hailang Huang | Xuecai Hu | Guanhua Chen | Liaoni Wu | Xiangxiang Chu
Findings of the Association for Computational Linguistics: ACL 2026
Yuxiang Ji | Yong Wang | Ziyu Ma | Yiming Hu | Hailang Huang | Xuecai Hu | Guanhua Chen | Liaoni Wu | Xiangxiang Chu
Findings of the Association for Computational Linguistics: ACL 2026
The image geolocalization task aims to predict the location where an image was taken anywhere on Earth using visual clues.Existing large vision-language model (LVLM) approaches leverage world knowledge, chain-of-thought reasoning, and agentic capabilities, but overlook a common strategy used by humans — using maps.In this work, we first equip the model Thinking with Map ability and formulate it as an agent-in-the-map loop.We develop a two-stage optimization scheme for it, including agentic reinforcement learning (RL) followed by parallel test-time scaling (TTS).The RL strengthens the agentic capability of model to improve sampling efficiency, and the parallel TTS enables the model to explore multiple candidate paths before making the final prediction, which is crucial for geolocalization.To evaluate our method on up-to-date and in-the-wild images, we further present MAPBench, a comprehensive geolocalization training and evaluation benchmark composed entirely of real-world images.Experimental results show that our method outperforms existing open- and closed-source models on most metrics, specifically improving Acc@500m from 8.0% to 22.1% compared to Gemini-3-Pro with Google Search/Map grounded mode.
Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem
Zeguan Xiao | Siqing Li | Yong Wang | Xuetao Wei | Jian Yang | Yun Chen | Guanhua Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zeguan Xiao | Siqing Li | Yong Wang | Xuetao Wei | Jian Yang | Yun Chen | Guanhua Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Machine unlearning for large language models (LLMs) aims to remove targeted knowledge while preserving general capability. In this paper, we recast LLM unlearning as an asymmetric two-task problem: retention is the primary objective and forgetting is an auxiliary. From this perspective, we propose a retention-prioritized gradient synthesis framework that decouples task-specific gradient extraction from conflict-aware combination. Instantiating the framework, we adapt established PCGrad to resolve gradient conflicts, and introduce SAGO, a novel retention-prioritized gradient synthesis method. Theoretically, both variants ensure non-negative cosine similarity with the retain gradient, while SAGO achieves strictly tighter alignment through constructive sign-constrained synthesis. Empirically, on WMDP Bio/Cyber and RWKU benchmarks, SAGO consistently pushes the Pareto frontier: e.g., on WMDP Bio (SimNPO+GD), recovery of target model MMLU performance progresses from 44.6% (naive) to 94.0% (+PCGrad) and further to 96.0% (+SAGO), while maintaining comparable forgetting strength. Our results show that re-shaping gradient geometry, rather than re-balancing losses, is the key to mitigating unlearning-retention trade-offs.
From Conversation to Evaluation: Benchmarking LLMs on Development Knowledge via SimpleDevQA
Jing Zhang | Lianghong Guo | Yanlin Wang | Terry Yue Zhuo | Yong Wang | Mingwei Liu | Jiachi Chen | Ensheng Shi | Yuchi Ma | Hongyu Zhang | Zibin Zheng
Findings of the Association for Computational Linguistics: ACL 2026
Jing Zhang | Lianghong Guo | Yanlin Wang | Terry Yue Zhuo | Yong Wang | Mingwei Liu | Jiachi Chen | Ensheng Shi | Yuchi Ma | Hongyu Zhang | Zibin Zheng
Findings of the Association for Computational Linguistics: ACL 2026
The Development Knowledge Question Answering (Dev Knowledge QA) task aims to provide accurate natural language answers to knowledge-seeking questions during software development. To investigate the importance of Dev Knowledge QA in AI-assisted software development and the extent to which it has been explored, we conduct a preliminary analysis of real user–LLM dialogues from WildChat. Our findings indicate that Dev Knowledge QA plays a significant role in real-world software development scenarios, and these raw dialogues cannot be directly used to construct a Dev Knowledge QA benchmark. Existing Dev Knowledge QA benchmarks are limited in development knowledge scope and often not built from real user queries. To bridge this gap, we design a three-phase pipeline that transforms real-world dialogue into simple development knowledge-seeking QA pairs. Through this pipeline, we introduce SimpleDevQA, a multilingual Dev Knowledge QA benchmark inspired by real user dialogues. This dataset covers three languages (English, Chinese, and Russian), and focuses on questions with unique, short, and verifiable answers, making evaluation more accurate and simple. Extensive experiments with 18 mainstream LLMs show that closed-source models generally perform best on SimpleDevQA. We also find that RAG-based knowledge injection improves accuracy, and that Dev Knowledge QA performance correlates with both model confidence and code-generation capability. To facilitate the replication study, we have released our data and code at: https://github.com/DeepSoftwareAnalytics/SimpleDevQA.
RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories
Yanlin Wang | Ziyao Zhang | Chong Wang | Xinyi Xu | Mingwei Liu | Yong Wang | Jiachi Chen | Zibin Zheng
Findings of the Association for Computational Linguistics: ACL 2026
Yanlin Wang | Ziyao Zhang | Chong Wang | Xinyi Xu | Mingwei Liu | Yong Wang | Jiachi Chen | Zibin Zheng
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area. Existing benchmarks often fall short by relying on synthetic vulnerabilities or evaluating functional correctness in isolation, failing to capture the complex interplay between functionality and security found in real-world software. To address this gap, we introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories. Our methodology employs a multi-stage pipeline that combines systematic SAST scanning with CodeQL, LLM-based false positive elimination, and rigorous human expert validation. The resulting benchmark contains 105 instances grounded in real-word repository contexts, spanning 19 Common Weakness Enumeration (CWE) types and exhibiting a wide diversity of data flow complexities, including vulnerabilities with up to 34-hop inter-procedural dependencies. Using RealSec-bench, we conduct an extensive empirical study on 5 popular LLMs. We introduce a novel composite metric, SecurePass@K, to assess both functional correctness and security simultaneously. We find that while Retrieval-Augmented Generation (RAG) techniques can improve functional correctness, they provide negligible benefits to security. Furthermore, explicitly prompting models with general security guidelines often leads to compilation failures, harming functional correctness without reliably preventing vulnerabilities. Our work highlights the gap between functional and secure code generation in current LLMs. Our code and data are available at https://github.com/DeepSoftwareAnalytics/Realsec-code-Bench.
Breaking Block Boundaries: Anchor-based History-stable Decoding for Diffusion Large Language Models
Shun Zou | Yong Wang | Zehui Chen | Lin Chen | Chongyang Tao | Feng Zhao | Xiangxiang Chu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shun Zou | Yong Wang | Zehui Chen | Lin Chen | Chongyang Tao | Feng Zhao | Xiangxiang Chu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Diffusion Large Language Models (dLLMs) have recently become a promising alternative to autoregressive large language models (ARMs). Semi-autoregressive (Semi-AR) decoding is widely employed in base dLLMs and advanced decoding strategies due to its superior performance. However, our observations reveal that Semi-AR decoding suffers from inherent block constraints, which cause the decoding of many cross-block stable tokens to be unnecessarily delayed. To address this challenge, we systematically investigate the identification of stable tokens and present three key findings: (1) naive lookahead decoding is unreliable, (2) token stability closely correlates with convergence trend, and (3) historical information is isolated. Building on these insights, we propose Anchor-based History-stable Decoding (AHD), a training-free, plug-and-play dynamic decoding strategy. Specifically, AHD monitors the stability trend of tokens in real time through dynamic anchors. Once a token reaches stability, it initiates early cross-block decoding to enhance efficiency and performance. Extensive experiments across language, vision-language, and audio-language domains demonstrate that AHD simultaneously improves both performance and inference efficiency. Notably, AHD effectively reverses the performance degradation typically observed in existing advanced decoding acceleration strategies. For instance, on the BBH benchmark, our approach reduces decoding steps by 80% while improving performance by 3.67%.
CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution
Shidong Yang | Ziyu Ma | Tongwen Huang | Yiming Hu | Yong Wang | Xiangxiang Chu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shidong Yang | Ziyu Ma | Tongwen Huang | Yiming Hu | Yong Wang | Xiangxiang Chu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement learning for LLM agents is typically conducted on a static data distribution, which fails to adapt to the agent’s evolving behavior and leads to poor coverage of complex environment interactions. To address these challenges, we propose CoEvolve, an agent-data mutual evolution framework that enables LLM agents to improve through closed-loop, interaction-driven training. Specifically, CoEvolve extracts feedback signals such as forgetting and uncertainty from rollout trajectories to identify failure-prone interaction patterns, and utilizes them to guide LLM-based task synthesis. The synthesized tasks are validated through environment interaction and utilized to update the data distribution, enabling joint adaptation of the agent and its data. Extensive experiments on AppWorld and BFCL across Qwen2.5-7B, Qwen3-4B, and Qwen3-30B-A3B demonstrate consistent and significant improvements over strong base models, yielding absolute gains of 19.43%, 15.58%, and 18.14%, respectively.
Visually-Guided Policy Optimization for Multimodal Reasoning
Zengbin Wang | Feng Xiong | Liang Lin | Xuecai Hu | Yong Wang | Yanlin Wang | Man Zhang | Xiangxiang Chu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zengbin Wang | Feng Xiong | Liang Lin | Xuecai Hu | Yong Wang | Yanlin Wang | Man Zhang | Xiangxiang Chu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement learning with verifiable rewards (RLVR) has significantly advanced the reasoning ability of vision-language models (VLMs). However, the inherent text-dominated nature of VLMs often leads to insufficient visual faithfulness, characterized by sparse attention activation to visual tokens. More importantly, our empirical analysis reveals that temporal visual forgetting along reasoning steps exacerbates this deficiency. To bridge this gap, we propose Visually-Guided Policy Optimization (VGPO), a novel framework to reinforce visual focus during policy optimization. Specifically, VGPO initially introduces a Visual Attention Compensation mechanism that leverages visual similarity to localize and amplify visual cues, while progressively elevating visual expectations in later steps to counteract visual forgetting. Building on this mechanism, we implement a dual-grained advantage re-weighting strategy: the intra-trajectory level highlights tokens exhibiting relatively high visual activation, while the inter-trajectory level prioritizes trajectories demonstrating superior visual accumulation. Extensive experiments demonstrate that VGPO achieves better visual activation and superior performance in mathematical multimodal reasoning and visual-dependent tasks. The code has been released at https://github.com/wzb-bupt/VGPO.
2025
POSITION BIAS MITIGATES POSITION BIAS: Mitigate Position Bias Through Inter-Position Knowledge Distillation
Yifei Wang | Feng Xiong | Yong Wang | Linjing Li | Xiangxiang Chu | Daniel Dajun Zeng
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yifei Wang | Feng Xiong | Yong Wang | Linjing Li | Xiangxiang Chu | Daniel Dajun Zeng
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Positional bias (PB), manifesting as non-uniform sensitivity across different contextual locations, significantly impairs long-context comprehension and processing capabilities. Previous studies have addressed PB either by modifying the underlying architectures or by employing extensive contextual awareness training. However, the former approach fails to effectively eliminate the substantialperformance disparities, while the latter imposes significant data and computational overhead. To address PB effectively, we introduce Pos2Distill, a position to position knowledge distillation framework. Pos2Distill transfers the superior capabilities from advantageous positions to less favorable ones, thereby reducing the huge performance gaps. The conceptual principle is to leverage the inherent, position-induced disparity to counteract the PB itself. We identify distinct manifestations of PB under retrieval and reasoning paradigms, thereby designing two specialized instantiations: Pos2Distill-R1 and Pos2Distill-R2 respectively, both grounded in this core principle. By employing the Pos2Distill approach, we achieve enhanced uniformity and significant performance gains across all contextual positions in long-context retrieval and reasoning tasks. Crucially, both specialized systems exhibit strong cross-task generalization mutually, while achieving superior performance on their respective tasks.
HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation
Feng Xiong | Hongling Xu | Yifei Wang | Runxi Cheng | Yong Wang | Xiangxiang Chu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Feng Xiong | Hongling Xu | Yifei Wang | Runxi Cheng | Yong Wang | Xiangxiang Chu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Self-taught reasoners (STaRs) enhance the mathematical reasoning abilities of large language models (LLMs) by leveraging self-generated responses for self-training. Recent studies have incorporated reward models to guide response selection or decoding, aiming to obtain higher-quality data. However, they typically allocate a uniform sampling budget across all problems, overlooking the varying utility of problems at different difficulty levels. In this work, we conduct an empirical study and find that problems near the boundary of the LLM’s reasoning capability offer significantly greater learning utility than both easy and overly difficult ones. To identify and exploit such problems, we propose HS-STaR, a Hierarchical Sampling framework for Self-Taught Reasoners. Given a fixed sampling budget, HS-STaR first performs lightweight pre-sampling with a reward-guided difficulty estimation strategy to efficiently identify boundary-level problems. Subsequently, it dynamically reallocates the remaining budget toward these high-utility problems during a re-sampling phase, maximizing the generation of valuable training data. Extensive experiments across multiple reasoning benchmarks and backbone LLMs demonstrate that HS-STaR significantly outperforms other baselines without requiring additional sampling budget.
Cross-MoE: An Efficient Temporal Prediction Framework Integrating Textual Modality
Ruizheng Huang | Zhicheng Zhang | Yong Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Ruizheng Huang | Zhicheng Zhang | Yong Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
It has been demonstrated that incorporating external information as textual modality can effectively improve time series forecasting accuracy. However, current multi-modal models ignore the dynamic and different relations between time series patterns and textual features, which leads to poor performance in temporal-textual feature fusion. In this paper, we propose a lightweight and model-agnostic temporal-textual fusion framework named Cross-MoE. It replaces Cross Attention with Cross-Ranker to reduce computational complexity, and enhances modality-aware correlation memorization with Mixture-of-Experts (MoE) networks to tolerate the distributional shifts in time series. The experimental results demonstrate a 8.78% average reduction in Mean Squared Error (MSE) compared to the SOTA multi-modal time series framework. Notably, our method requires only 75% of computational overhead and 12.5% of activated parameters compared with Cross Attention mechanism. Our codes are available at https://github.com/Kilosigh/Cross-MoE.git
2024
Comparing Professional and Common Literary Critics Using Multi-Dimensional Analysis
Yiheng Yang | Chu-Ren Huang | Yong Wang
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation
Yiheng Yang | Chu-Ren Huang | Yong Wang
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation
2023
LLM4Vis: Explainable Visualization Recommendation using ChatGPT
Lei Wang | Songheng Zhang | Yun Wang | Ee-Peng Lim | Yong Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Lei Wang | Songheng Zhang | Yun Wang | Ee-Peng Lim | Yong Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Data visualization is a powerful tool for exploring and communicating insights in various domains. To automate visualization choice for datasets, a task known as visualization recommendation has been proposed. Various machine-learning-based approaches have been developed for this purpose, but they often require a large corpus of dataset-visualization pairs for training and lack natural explanations for their results. To address this research gap, we propose LLM4Vis, a novel ChatGPT-based prompting approach to perform visualization recommendation and return human-like explanations using very few demonstration examples. Our approach involves feature description, demonstration example selection, explanation generation, demonstration example construction, and inference steps. To obtain demonstration examples with high-quality explanations, we propose a new explanation generation bootstrapping to iteratively refine generated explanations by considering the previous generation and template-based hint. Evaluations on the VizML dataset show that LLM4Vis outperforms or performs similarly to supervised learning models like Random Forest, Decision Tree, and MLP, in both few-shot and zero-shot settings. The qualitative evaluation also shows the effectiveness of explanations generated by LLM4Vis.
TR-Rules: Rule-based Model for Link Forecasting on Temporal Knowledge Graph Considering Temporal Redundancy
Ningyuan Li | Haihong E | Shi Li | Mingzhi Sun | Tianyu Yao | Meina Song | Yong Wang | Haoran Luo
Findings of the Association for Computational Linguistics: EMNLP 2023
Ningyuan Li | Haihong E | Shi Li | Mingzhi Sun | Tianyu Yao | Meina Song | Yong Wang | Haoran Luo
Findings of the Association for Computational Linguistics: EMNLP 2023
Temporal knowledge graph (TKG) has been proved to be an effective way for modeling dynamic facts in real world. Many efforts have been devoted into predicting future events i.e. extrapolation, on TKGs. Recently, rule-based knowledge graph completion methods which are considered to be more interpretable than embedding-based methods, have been transferred to temporal knowledge graph extrapolation. However, rule-based models suffer from temporal redundancy when leveraged under dynamic settings, which results in inaccurate rule confidence calculation. In this paper, we define the problem of temporal redundancy and propose TR-Rules which solves the temporal redundancy issues through a simple but effective strategy. Besides, to capture more information lurking in TKGs, apart from cyclic rules, TR-Rules also mines and properly leverages acyclic rules, which has not been explored by existing models. Experimental results on three benchmarks show that TR-Rules achieves state-of-the-art performance. Ablation study shows the impact of temporal redundancy and demonstrates the performance of acyclic rules is much more promising due to its higher sensitivity to the number of sampled walks during learning stage.
2022
A Transformer-based Threshold-Free Framework for Multi-Intent NLU
Lisung Chen | Nuo Chen | Yuexian Zou | Yong Wang | Xinzhong Sun
Proceedings of the 29th International Conference on Computational Linguistics
Lisung Chen | Nuo Chen | Yuexian Zou | Yong Wang | Xinzhong Sun
Proceedings of the 29th International Conference on Computational Linguistics
Multi-intent natural language understanding (NLU) has recently gained attention. It detects multiple intents in an utterance, which is better suited to real-world scenarios. However, the state-of-the-art joint NLU models mainly detect multiple intents on threshold-based strategy, resulting in one main issue: the model is extremely sensitive to the threshold settings. In this paper, we propose a transformer-based Threshold-Free Multi-intent NLU model (TFMN) with multi-task learning (MTL). Specifically, we first leverage multiple layers of a transformer-based encoder to generate multi-grain representations. Then we exploit the information of the number of multiple intents in each utterance without additional manual annotations and propose an auxiliary detection task: Intent Number detection (IND). Furthermore, we propose a threshold-free intent multi-intent classifier that utilizes the output of IND task and detects the multiple intents without depending on the threshold. Extensive experiments demonstrate that our proposed model achieves superior results on two public multi-intent datasets.
XLM-D: Decorate Cross-lingual Pre-training Model as Non-Autoregressive Neural Machine Translation
Yong Wang | Shilin He | Guanhua Chen | Yun Chen | Daxin Jiang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Yong Wang | Shilin He | Guanhua Chen | Yun Chen | Daxin Jiang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Pre-training language models have achieved thriving success in numerous natural language understanding and autoregressive generation tasks, but non-autoregressive generation in applications such as machine translation has not sufficiently benefited from the pre-training paradigm. In this work, we establish the connection between a pre-trained masked language model (MLM) and non-autoregressive generation on machine translation. From this perspective, we present XLM-D, which seamlessly transforms an off-the-shelf cross-lingual pre-training model into a non-autoregressive translation (NAT) model with a lightweight yet effective decorator. Specifically, the decorator ensures the representation consistency of the pre-trained model and brings only one additional trainable parameter. Extensive experiments on typical translation datasets show that our models obtain state-of-the-art performance while realizing the inference speed-up by 19.9x. One striking result is that on WMT14 En-De, our XLM-D obtains 29.80 BLEU points with multiple iterations, which outperforms the previous mask-predict model by 2.77 points.
Improving Non-Autoregressive Neural Machine Translation via Modeling Localness
Yong Wang | Xinwei Geng
Proceedings of the 29th International Conference on Computational Linguistics
Yong Wang | Xinwei Geng
Proceedings of the 29th International Conference on Computational Linguistics
Non-autoregressive translation (NAT) models, which eliminate the sequential dependencies within the target sentence, have achieved remarkable inference speed, but suffer from inferior translation quality. Towards exploring the underlying causes, we carry out a thorough preliminary study on the attention mechanism, which demonstrates the serious weakness in capturing localness compared with conventional autoregressive translation (AT). In response to this problem, we propose to improve the localness of NAT models by explicitly introducing the information about surrounding words. Specifically, temporal convolutions are incorporated into both encoder and decoder sides to obtain localness-aware representations. Extensive experiments on several typical translation datasets show that the proposed method can achieve consistent and significant improvements over strong NAT baselines. Further analyses on the WMT14 En-De translation task reveal that compared with baselines, our approach accelerates the convergence in training and can achieve equivalent performance with a reduction of 70% training steps.
2021
ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language Model for Reading Comprehension of Abstract Meaning
Xin Xie | Xiangnan Chen | Xiang Chen | Yong Wang | Ningyu Zhang | Shumin Deng | Huajun Chen
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Xin Xie | Xiangnan Chen | Xiang Chen | Yong Wang | Ningyu Zhang | Shumin Deng | Huajun Chen
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
This paper presents our systems for the three Subtasks of SemEval Task4: Reading Comprehension of Abstract Meaning (ReCAM). We explain the algorithms used to learn our models and the process of tuning the algorithms and selecting the best model. Inspired by the similarity of the ReCAM task and the language pre-training, we propose a simple yet effective technology, namely, negative augmentation with language model. Evaluation results demonstrate the effectiveness of our proposed approach. Our models achieve the 4th rank on both official test sets of Subtask 1 and Subtask 2 with an accuracy of 87.9% and an accuracy of 92.8%, respectively. We further conduct comprehensive model analysis and observe interesting error cases, which may promote future researches. The code and dataset used in our paper can be found at https://github.com/CheaSim/SemEval2021. The leaderboard can be found at https://competitions.codalab.org/competitions/26153.
2020
On the Sparsity of Neural Machine Translation Models
Yong Wang | Longyue Wang | Victor Li | Zhaopeng Tu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Yong Wang | Longyue Wang | Victor Li | Zhaopeng Tu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures. We show that: 1) the pruned parameters can be rejuvenated to improve the baseline model by up to +0.8 BLEU points; 2) the rejuvenated parameters are reallocated to enhance the ability of modeling low-level lexical information.
2019
Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations
Jiatao Gu | Yong Wang | Kyunghyun Cho | Victor O.K. Li
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Jiatao Gu | Yong Wang | Kyunghyun Cho | Victor O.K. Li
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) back-translation. These methods show significant improvement (4 22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach.
2018
Meta-Learning for Low-Resource Neural Machine Translation
Jiatao Gu | Yong Wang | Yun Chen | Kyunghyun Cho | Victor O.K. Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Jiatao Gu | Yong Wang | Yun Chen | Kyunghyun Cho | Victor O.K. Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML, Finn, et al., 2017) for low-resource neural machine translation (NMT). We frame low-resource translation as a meta-learning problem where we learn to adapt to low-resource languages based on multilingual high-resource language tasks. We use the universal lexical representation (Gu et al., 2018b) to overcome the input-output mismatch across different languages. We evaluate the proposed meta-learning strategy using eighteen European languages (Bg, Cs, Da, De, El, Es, Et, Fr, Hu, It, Lt, Nl, Pl, Pt, Sk, Sl, Sv and Ru) as source tasks and five diverse languages (Ro,Lv, Fi, Tr and Ko) as target tasks. We show that the proposed approach significantly outperforms the multilingual, transfer learning based approach (Zoph et al., 2016) and enables us to train a competitive NMT system with only a fraction of training examples. For instance, the proposed approach can achieve as high as 22.04 BLEU on Romanian-English WMT’16 by seeing only 16,000 translated words (~600 parallel sentences)
2014
Introduction to NJUPT Chinese Spelling Check Systems in CLP-2014 Bakeoff
Lei Gu | Yong Wang | Xitao Liang
Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing
Lei Gu | Yong Wang | Xitao Liang
Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing
2008
Search
Fix author
Co-authors
- Xiangxiang Chu 6
- Yun Chen 3
- Yanlin Wang 3
- Feng Xiong 3
- Guanhua Chen 2
- Jiachi Chen 2
- Kyunghyun Cho 2
- Jiatao Gu 2
- Yiming Hu 2
- Xuecai Hu 2
- Victor O.K. Li 2
- Mingwei Liu 2
- Ziyu Ma 2
- Yifei Wang 2
- Zibin Zheng 2
- Xiangnan Chen 1
- Xiang Chen 1
- Huajun Chen 1
- Lisung Chen 1
- Nuo Chen 1
- Guanhua Chen 1
- Zehui Chen 1
- Lin Chen 1
- Runxi Cheng 1
- Shumin Deng 1
- Haihong E 1
- Xinwei Geng 1
- Lei Gu 1
- Lianghong Guo 1
- Shilin He 1
- Hailang Huang 1
- Ruizheng Huang 1
- Tongwen Huang 1
- Chu-Ren Huang 1
- Yuxiang Ji 1
- Daxin Jiang 1
- Siqing Li 1
- Linjing Li 1
- Ningyuan Li 1
- Shi Li 1
- Victor Li 1
- Xitao Liang 1
- Ee-Peng Lim 1
- Liang Lin 1
- Yiqun Liu 1
- Haoran Luo 1
- Shaoping Ma 1
- Yuchi Ma 1
- Liyun Ru 1
- Ensheng Shi 1
- Meina Song 1
- Xinzhong Sun 1
- Mingzhi Sun 1
- Chongyang Tao 1
- Zhaopeng Tu 1
- Lei Wang 1
- Yun Wang 1
- Chong Wang 1
- Longyue Wang 1
- Zengbin Wang 1
- Xuetao Wei 1
- Liaoni Wu 1
- Zeguan Xiao 1
- Xin Xie 1
- Hongling Xu 1
- Xinyi Xu 1
- Jian Yang 1
- Shidong Yang 1
- Yiheng Yang 1
- Tianyu Yao 1
- Daniel Dajun Zeng 1
- Ningyu Zhang 1
- Zhicheng Zhang 1
- Min Zhang 1
- Jing Zhang 1
- Hongyu Zhang 1
- Songheng Zhang 1
- Ziyao Zhang 1
- Man Zhang 1
- Feng Zhao 1
- Terry Yue Zhuo 1
- Yuexian Zou 1
- Shun Zou 1