Zhuoyuan Mao


2021

pdf bib
Lightweight Cross-Lingual Sentence Representation Learning
Zhuoyuan Mao | Prakhar Gupta | Chenhui Chu | Martin Jaggi | Sadao Kurohashi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Large-scale models for learning fixed-dimensional cross-lingual sentence representations like LASER (Artetxe and Schwenk, 2019b) lead to significant improvement in performance on downstream tasks. However, further increases and modifications based on such large-scale models are usually impractical due to memory limitations. In this work, we introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations. We explore different training tasks and observe that current cross-lingual training tasks leave a lot to be desired for this shallow architecture. To ameliorate this, we propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task. We further augment the training task by the introduction of two computationally-lite sentence-level contrastive learning tasks to enhance the alignment of cross-lingual sentence representation space, which compensates for the learning bottleneck of the lightweight transformer for generative tasks. Our comparisons with competing models on cross-lingual sentence retrieval and multilingual document classification confirm the effectiveness of the newly proposed training tasks for a shallow model.

2020

pdf bib
Meta Ensemble for Japanese-Chinese Neural Machine Translation: Kyoto-U+ECNU Participation to WAT 2020
Zhuoyuan Mao | Yibin Shen | Chenhui Chu | Sadao Kurohashi | Cheqing Jin
Proceedings of the 7th Workshop on Asian Translation

This paper describes the Japanese-Chinese Neural Machine Translation (NMT) system submitted by the joint team of Kyoto University and East China Normal University (Kyoto-U+ECNU) to WAT 2020 (Nakazawa et al.,2020). We participate in APSEC Japanese-Chinese translation task. We revisit several techniques for NMT including various architectures, different data selection and augmentation methods, denoising pre-training, and also some specific tricks for Japanese-Chinese translation. We eventually perform a meta ensemble to combine all of the models into a single model. BLEU results of this meta ensembled model rank the first both on 2 directions of ASPEC Japanese-Chinese translation.

pdf bib
JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation
Zhuoyuan Mao | Fabien Cromieres | Raj Dabre | Haiyue Song | Sadao Kurohashi
Proceedings of the 12th Language Resources and Evaluation Conference

Neural machine translation (NMT) needs large parallel corpora for state-of-the-art translation quality. Low-resource NMT is typically addressed by transfer learning which leverages large monolingual or parallel corpora for pre-training. Monolingual pre-training approaches such as MASS (MAsked Sequence to Sequence) are extremely effective in boosting NMT quality for languages with small parallel corpora. However, they do not account for linguistic information obtained using syntactic analyzers which is known to be invaluable for several Natural Language Processing (NLP) tasks. To this end, we propose JASS, Japanese-specific Sequence to Sequence, as a novel pre-training alternative to MASS for NMT involving Japanese as the source or target language. JASS is joint BMASS (Bunsetsu MASS) and BRSS (Bunsetsu Reordering Sequence to Sequence) pre-training which focuses on Japanese linguistic units called bunsetsus. In our experiments on ASPEC Japanese–English and News Commentary Japanese–Russian translation we show that JASS can give results that are competitive with if not better than those given by MASS. Furthermore, we show for the first time that joint MASS and JASS pre-training gives results that significantly surpass the individual methods indicating their complementary nature. We will release our code, pre-trained models and bunsetsu annotated data as resources for researchers to use in their own NLP tasks.

pdf bib
Pre-training via Leveraging Assisting Languages for Neural Machine Translation
Haiyue Song | Raj Dabre | Zhuoyuan Mao | Fei Cheng | Sadao Kurohashi | Eiichiro Sumita
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Sequence-to-sequence (S2S) pre-training using large monolingual data is known to improve performance for various S2S NLP tasks. However, large monolingual corpora might not always be available for the languages of interest (LOI). Thus, we propose to exploit monolingual corpora of other languages to complement the scarcity of monolingual corpora for the LOI. We utilize script mapping (Chinese to Japanese) to increase the similarity (number of cognates) between the monolingual corpora of helping languages and LOI. An empirical case study of low-resource Japanese-English neural machine translation (NMT) reveals that leveraging large Chinese and French monolingual corpora can help overcome the shortage of Japanese and English monolingual corpora, respectively, for S2S pre-training. Using only Chinese and French monolingual corpora, we were able to improve Japanese-English translation quality by up to 8.5 BLEU in low-resource scenarios.