Cunxiao Du
2025
Reverse Modeling in Large Language Models
Sicheng Yu
|
Xu Yuanchen
|
Cunxiao Du
|
Yanying Zhou
|
Minghui Qiu
|
Qianru Sun
|
Hao Zhang
|
Jiawei Wu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Humans are accustomed to reading and writing in a forward manner, and this natural bias extends to text understanding in auto-regressive large language models (LLMs). This paper investigates whether LLMs, like humans, struggle with reverse modeling, specifically with reversed text inputs. We found that publicly available pre-trained LLMs cannot understand such inputs. However, LLMs trained from scratch with both forward and reverse texts can understand them equally well during inference across multiple languages.Our case study shows that different-content texts result in different losses if input (to LLMs) in different directions—some get lower losses for forward while some for reverse. This leads us to a simple and nice solution for data selection based on the loss differences between forward and reverse directions. Using our selected data in continued pretraining can boost LLMs’ performance by a large margin across different language understanding benchmarks.
2024
Revisiting the Markov Property for Machine Translation
Cunxiao Du
|
Hao Zhou
|
Zhaopeng Tu
|
Jing Jiang
Findings of the Association for Computational Linguistics: EACL 2024
In this paper, we re-examine the Markov property in the context of neural machine translation. We design a Markov Autoregressive Transformer (MAT) and undertake a comprehensive assessment of its performance across four WMT benchmarks. Our findings indicate that MAT with an order larger than 4 can generate translations with quality on par with that of conventional autoregressive transformers. In addition, counter-intuitively, we also find that the advantages of utilizing a higher-order MAT do not specifically contribute to the translation of longer sentences.
2022
ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation
Cunxiao Du
|
Zhaopeng Tu
|
Longyue Wang
|
Jing Jiang
Proceedings of the 29th International Conference on Computational Linguistics
Recently, a new training oaxe loss has proven effective to ameliorate the effect of multimodality for non-autoregressive translation (NAT), which removes the penalty of word order errors in the standard cross-entropy loss. Starting from the intuition that reordering generally occurs between phrases, we extend oaxe by only allowing reordering between ngram phrases and still requiring a strict match of word order within the phrases. Extensive experiments on NAT benchmarks across language pairs and data scales demonstrate the effectiveness and universality of our approach. Further analyses show that ngram noaxe indeed improves the translation of ngram phrases, and produces more fluent translation with a better modeling of sentence structure.
Search
Fix data
Co-authors
- Jing Jiang 2
- Zhaopeng Tu 2
- Minghui Qiu 1
- Qianru Sun 1
- Longyue Wang 1
- show all...