Shengxiang Gao

Other people with similar names: Shengxiang Gao

Unverified author pages with similar names: Shengxiang Gao

2026

Reinforcement learning (RL) excels in reasoning tasks with verifiable rewards, while its adaptation to machine translation (MT) remains challenging due to the lack of unique reward signals under multiple valid translations. Existing RL approaches for MT face either fixed references in supervised settings or the production of homogeneous references leading to mode collapse in unsupervised settings. Both limitations arise from ignoring entropy dynamics in RL-based MT. The core challenge is leveraging entropy for supervision construction and self-evolution. In this paper, we propose an Entropy-Driven Unsupervised RL for MT. Our framework integrates entropy-guided sampling for exploration, confidence-weighted label generation to transcend majority-voting bias, and uncertainty-aware optimization to prioritize high-entropy tokens. These mechanisms allow reward signals to co-evolve with model proficiency beyond fixed references. Experiments across multiple language pairs show our method outperforms supervised and unsupervised baselines by +0.63 and +2.52 average points, respectively. Our code is available at https://github.com/fortunatekiss/URLMT.

2023

pdf bib abs

Multilingual Knowledge Graph Completion (mKGC) aim at solving queries in different languages by reasoning a tail entity thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages, its pretraining tasks cannot be directly aligned with the mKGC tasks. Moreover, the majority of KGs and PLMs currently available exhibit a pronounced English-centric bias. This makes it difficult for mKGC to achieve good results, particularly in the context of low-resource languages. To overcome previous problems, this paper introduces global and local knowledge constraints for mKGC. The former is used to constrain the reasoning of answer entities , while the latter is used to enhance the representation of query contexts. The proposed method makes the pretrained model better adapt to the mKGC task. Experimental results on public datasets demonstrate that our method outperforms the previous SOTA on Hits@1 and Hits@10 by an average of 12.32% and 16.03%, which indicates that our proposed method has significant enhancement on mKGC.

Co-authors

Venues

Findings2

Fix author