@inproceedings{hsu-etal-2023-code,
    title = "Code-Switched Text Synthesis in Unseen Language Pairs",
    author = "Hsu, I-Hung  and
      Ray, Avik  and
      Garg, Shubham  and
      Peng, Nanyun  and
      Huang, Jing",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2023.findings-acl.318/",
    doi = "10.18653/v1/2023.findings-acl.318",
    pages = "5137--5151",
    abstract = "Existing efforts on text synthesis for code-switching mostly require training on code-switched texts in the target language pairs, limiting the deployment of the models to cases lacking code-switched data. In this work, we study the problem of synthesizing code-switched texts for language pairs absent from the training data. We introduce GLOSS, a model built on top of a pre-trained multilingual machine translation model (PMMTM) with an additional code-switching module. This module, either an adapter or extra prefixes, learns code-switching patterns from code-switched data during training, while the primary component of GLOSS, i.e., the PMMTM, is frozen. The design of only adjusting the code-switching module prevents our model from overfitting to the constrained training data for code-switching. Hence, GLOSS exhibits the ability to generalize and synthesize code-switched texts across a broader spectrum of language pairs. Additionally, we develop a self-training algorithm on target language pairs further to enhance the reliability of GLOSS. Automatic evaluations on four language pairs show that GLOSS achieves at least 55{\%} relative BLEU and METEOR scores improvements compared to strong baselines. Human evaluations on two language pairs further validate the success of GLOSS."
}Markdown (Informal)
[Code-Switched Text Synthesis in Unseen Language Pairs](https://preview.aclanthology.org/ingest-emnlp/2023.findings-acl.318/) (Hsu et al., Findings 2023)
ACL
- I-Hung Hsu, Avik Ray, Shubham Garg, Nanyun Peng, and Jing Huang. 2023. Code-Switched Text Synthesis in Unseen Language Pairs. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5137–5151, Toronto, Canada. Association for Computational Linguistics.