Zuoli Tang

2026

Despite the remarkable success of Large Language Models (LLMs) in Machine Translation (MT), the scarcity of high-quality parallel corpora and the prohibitive cost of their acquisition constrain scalability. To this end, we propose Learning to Translate by Translating (LTT), an LLM-driven dual-learning framework that enables autonomous translation, achieving an 80.42% performance improvement over the base model. By adapting the cycle-consistency principle to the generative paradigm, LTT eliminates the need for parallel data. It employs a robust semantic-aware reward function that balances adequacy with reconstruction fidelity, effectively mitigating the reward hacking issues inherent in traditional unsupervised MT. Relying solely on monolingual data, our 8B model consistently outperforms significantly larger models (70B+) in low-resource settings and achieves parity with state-of-the-art supervised baselines on mainstream benchmarks. LTT thus offers a scalable, data-efficient paradigm for autonomous machine translation.

pdf bib abs

DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration
Yanru Huo | Ziyue Jiang | Zuoli Tang | Qingyang Hong | Zhou Zhao
Findings of the Association for Computational Linguistics: ACL 2026

While Diffusion Transformers (DiT) have advanced non-autoregressive (NAR) speech synthesis, their high computational demands remain an obvious limitation. Existing DiT-based text-to-speech (TTS) model acceleration approaches predominantly focus on reducing sampling steps through distillation techniques, yet they remain constrained by training costs. We introduce DiTReducio, a training-free acceleration framework that compresses computations in DiT-based TTS models through a progressive calibration process. We propose two compression methods, Temporal Skipping and Branch Skipping, to eliminate redundant computations during inference. Moreover, based on two characteristic attention patterns identified within DiT layers, we devise a pattern-guided strategy to selectively apply the compression methods. Our method allows flexible modulation between generation quality and computational efficiency through adjustable compression thresholds. Experimental evaluations conducted on F5-TTS and MegaTTS 3 demonstrate that DiTReducio achieves a 75.4% reduction in FLOPs and improves the Real-Time Factor (RTF) by 37.1%, while preserving generation quality. The code is available at https://github.com/MM-Speech/DiTReducio.

Co-authors

Kui Liu 1

Venues

Findings2

Fix author