Yuzhi Liang
2026
Escaping the Probability Trap: Mitigating Semantic Drift in Cantonese-Mandarin Translation
Yuzhi Liang | Fangqi Chen
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Yuzhi Liang | Fangqi Chen
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Fine-tuning multilingual models for low-resource dialect translation frequently encounters a “plausibility over faithfulness” dilemma, resulting in severe semantic drift on dialect-specific tokens. We term this phenomenon the “Probability Trap,” where models prioritize statistical fluency over semantic fidelity. To address this, we propose MVS-Rank (Multi-View Scoring Reranking), a generate-then-rerank framework that decouples evaluation from generation. Our method assesses translation candidates through three complementary perspectives: (1) Source-Side Faithfulness via a Reverse Translation Model to anchor semantic fidelity; (2) Local Fluency using Masked Language Models to ensure syntactic precision; and (3) Global Fluency leveraging Large Language Models to capture discourse coherence. Extensive experiments on Cantonese-Mandarin benchmarks demonstrate that MVS-Rank achieves state-of-the-art performance, significantly outperforming strong fine-tuning baselines by effectively rectifying hallucinations while maintaining high fluency.
PivotAttack: Rethinking the Search Trajectory in Hard-Label Text Attacks via Pivot Words
Yuzhi Liang | Shiliang Xiao | Jingsong Wei | Qiliang Lin | Xia Li
Findings of the Association for Computational Linguistics: ACL 2026
Yuzhi Liang | Shiliang Xiao | Jingsong Wei | Qiliang Lin | Xia Li
Findings of the Association for Computational Linguistics: ACL 2026
Existing hard-label text attacks often rely on inefficient "outside-in" strategies that traverse vast search spaces. We propose PivotAttack, a query-efficient "inside-out" framework. It employs a Multi-Armed Bandit algorithm to identify Pivot Sets—combinatorial token groups acting as prediction anchors—and strategically perturbs them to induce label flips. This approach captures inter-word dependencies and minimizes query costs. Extensive experiments across traditional models and Large Language Models demonstrate that PivotAttack consistently outperforms state-of-the-art baselines in both Attack Success Rate and query efficiency.
2025
Temporal-Aware Soft Prompt Tuning for Automatic Text Dating
Hai Wang | Yuzhi Liang | Han Ren
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Hai Wang | Yuzhi Liang | Han Ren
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
This paper presents Temporal-aware Soft Prompt Tuning (TASPT), a novel approach for automatic text dating. Unlike existing methods, which often overlook the evolution of word meanings in texts spanning long periods, TASPT incorporates the unique characteristics of historical texts. It introduces a temporal-aware text representation that dynamically captures both semantic variance and invariance. This representation is combined with a soft prompt, enabling efficient parameter tuning for automatic text dating. Experiments show that TASPT outperforms all existing methods on two diachronic datasets: the Twenty-Four Histories and the Royal Society Corpus.