Payal Bajaj
2022
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
Zewen Chi
|
Shaohan Huang
|
Li Dong
|
Shuming Ma
|
Bo Zheng
|
Saksham Singhal
|
Payal Bajaj
|
Xia Song
|
Xian-Ling Mao
|
Heyan Huang
|
Furu Wei
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training. Specifically, we present two pre-training tasks, namely multilingual replaced token detection, and translation replaced token detection. Besides, we pretrain the model, named as XLM-E, on both multilingual and parallel corpora. Our model outperforms the baseline models on various cross-lingual understanding tasks with much less computation cost. Moreover, analysis shows that XLM-E tends to obtain better cross-lingual transferability.
2021
Language Scaling for Universal Suggested Replies Model
Qianlan Ying
|
Payal Bajaj
|
Budhaditya Deb
|
Yu Yang
|
Wei Wang
|
Bojia Lin
|
Milad Shokouhi
|
Xia Song
|
Yang Yang
|
Daxin Jiang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers
We consider the problem of scaling automated suggested replies for a commercial email application to multiple languages. Faced with increased compute requirements and low language resources for language expansion, we build a single universal model for improving the quality and reducing run-time costs of our production system. However, restricted data movement across regional centers prevents joint training across languages. To this end, we propose a multi-lingual multi-task continual learning framework, with auxiliary tasks and language adapters to train universal language representation across regions. The experimental results show positive cross-lingual transfer across languages while reducing catastrophic forgetting across regions. Our online results on real user traffic show significant CTR and Char-saved gain as well as 65% training cost reduction compared with per-language models. As a consequence, we have scaled the feature in multiple languages including low-resource markets.
Search
Co-authors
- Xia Song 2
- Zewen Chi 1
- Shaohan Huang 1
- Li Dong 1
- Shuming Ma 1
- show all...