Grace Chung
Also published as: Grace Y Chung
2026
Tokenizer-Aware Cross-Lingual Adaptation of Decoder-Only LLMs through Embedding Relearning and Swapping
Fan Jiang | Honglin Yu | Grace Y Chung | Trevor Cohn
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Fan Jiang | Honglin Yu | Grace Y Chung | Trevor Cohn
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Extending Large Language Models (LLMs) to new languages is challenging, with most methods proposed suffering from high computational cost and catastrophic forgetting of original model capabilities. Embedding relearning (CITATION), a technique that creates new tokenizers and tunes embeddings on fixed model weights for target language adaptation, is both light-weight and performant. However, it has only been shown to work for older generation encoder-only models and for high resource languages. In this paper, we extend this framework to decoder-only LLMs focusing on joint adaptation to many languages, including low-resource ones. We experiment in three language groups over 100 languages each. We adapt a pre-trained LLM via switching to a customized tokenizer, and relearning the embedding layer. Across 96 diverse languages spanning both classification and generation tasks, we show embedding relearning improves models by up to 20%, being highly competitive with full-weight updating baselines while vastly more computationally efficient and mitigating catastrophic forgetting. This translates into better results in transferring the improved multilingual performance to tasks that build on core English abilities (e.g., multilingual math reasoning), compared to various baselines. Further analysis reveals the critical role of customizing tokenizers in achieving effective language transfer, particularly for non-Latin script languages.
2009
Using the Web for Language Independent Spellchecking and Autocorrection
Casey Whitelaw | Ben Hutchinson | Grace Y Chung | Ged Ellis
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
Casey Whitelaw | Ben Hutchinson | Grace Y Chung | Ged Ellis
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
2007
A Study of Structured Clinical Abstracts and the Semantic Classification of Sentences
Grace Chung | Enrico Coiera
Biological, translational, and clinical language processing
Grace Chung | Enrico Coiera
Biological, translational, and clinical language processing
2005
Automatic Induction of Language Model Data for A Spoken Dialogue System
Grace Chung | Stephanie Seneff | Chao Wang
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue
Grace Chung | Stephanie Seneff | Chao Wang
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue
2004
Developing a Flexible Spoken Dialog System Using Simulation
Grace Chung
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)
Grace Chung
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)