Cunxiao Du

2025

Humans are accustomed to reading and writing in a forward manner, and this natural bias extends to text understanding in auto-regressive large language models (LLMs). This paper investigates whether LLMs, like humans, struggle with reverse modeling, specifically with reversed text inputs. We found that publicly available pre-trained LLMs cannot understand such inputs. However, LLMs trained from scratch with both forward and reverse texts can understand them equally well during inference across multiple languages.Our case study shows that different-content texts result in different losses if input (to LLMs) in different directions—some get lower losses for forward while some for reverse. This leads us to a simple and nice solution for data selection based on the loss differences between forward and reverse directions. Using our selected data in continued pretraining can boost LLMs’ performance by a large margin across different language understanding benchmarks.

2024

pdf bib abs
Revisiting the Markov Property for Machine Translation
Cunxiao Du | Hao Zhou | Zhaopeng Tu | Jing Jiang
Findings of the Association for Computational Linguistics: EACL 2024

In this paper, we re-examine the Markov property in the context of neural machine translation. We design a Markov Autoregressive Transformer (MAT) and undertake a comprehensive assessment of its performance across four WMT benchmarks. Our findings indicate that MAT with an order larger than 4 can generate translations with quality on par with that of conventional autoregressive transformers. In addition, counter-intuitively, we also find that the advantages of utilizing a higher-order MAT do not specifically contribute to the translation of longer sentences.

2022

pdf bib abs
ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation
Cunxiao Du | Zhaopeng Tu | Longyue Wang | Jing Jiang
Proceedings of the 29th International Conference on Computational Linguistics

Recently, a new training oaxe loss has proven effective to ameliorate the effect of multimodality for non-autoregressive translation (NAT), which removes the penalty of word order errors in the standard cross-entropy loss. Starting from the intuition that reordering generally occurs between phrases, we extend oaxe by only allowing reordering between ngram phrases and still requiring a strict match of word order within the phrases. Extensive experiments on NAT benchmarks across language pairs and data scales demonstrate the effectiveness and universality of our approach. Further analyses show that ngram noaxe indeed improves the translation of ngram phrases, and produces more fluent translation with a better modeling of sentence structure.

Co-authors

Venues

Fix data