Large Language Models such as GPT-3 are well-suited for text prediction tasks, which can help and delight users during text composition. LLMs are known to generate ethically inappropriate predictions even for seemingly innocuous contexts. Toxicity detection followed by filtering is a common strategy for mitigating the harm from such predictions. However, as we shall argue in this paper, in the context of text prediction, it is not sufficient to detect and filter toxic content. One also needs to ensure factual correctness and group-level fairness of the predictions; failing to do so can make the system ineffective and nonsensical at best, and unfair and detrimental to the users at worst. We discuss the gaps and challenges of toxicity detection approaches - from blocklist-based approaches to sophisticated state-of-the-art neural classifiers - by evaluating them on the text prediction task for English against a manually crafted CheckList of harms targeted at different groups and different levels of severity.
In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training. Specifically, we present two pre-training tasks, namely multilingual replaced token detection, and translation replaced token detection. Besides, we pretrain the model, named as XLM-E, on both multilingual and parallel corpora. Our model outperforms the baseline models on various cross-lingual understanding tasks with much less computation cost. Moreover, analysis shows that XLM-E tends to obtain better cross-lingual transferability.
We consider the problem of scaling automated suggested replies for a commercial email application to multiple languages. Faced with increased compute requirements and low language resources for language expansion, we build a single universal model for improving the quality and reducing run-time costs of our production system. However, restricted data movement across regional centers prevents joint training across languages. To this end, we propose a multi-lingual multi-task continual learning framework, with auxiliary tasks and language adapters to train universal language representation across regions. The experimental results show positive cross-lingual transfer across languages while reducing catastrophic forgetting across regions. Our online results on real user traffic show significant CTR and Char-saved gain as well as 65% training cost reduction compared with per-language models. As a consequence, we have scaled the feature in multiple languages including low-resource markets.